Related papers: MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis

MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis

URL: http://arxiv.org/abs/2601.20347v2
Date: Wed, 04 Feb 2026 07:32:11 GMT
Title: MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis
Authors: Chengying She, Chengwei Chen, Xinran Zhang, Ben Wang, Lizhuang Liu, Chengwei Shao, Yun Bian,
Abstract summary: We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone.<n>Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6% accuracy and 2.2--6.9% AUC improvements over competitive baselines.
Score: 8.125488986754968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.

Related papers

PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification [30.58342408480846]
PathMoE is an interpretable multimodal framework that integrates H&E slides, pathology reports, and nuclei-level cell graphs.<n>We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset and external TCGA datasets.
arXiv Detail & Related papers (2026-03-02T07:17:44Z)
A Sparse-Attention Deep Learning Model Integrating Heterogeneous Multimodal Features for Parkinson's Disease Severity Profiling [4.813020904720317]
Class-Weighted Sparse-Attention Fusion Network (SAFN) is an interpretable deep learning framework for robust multimodal profiling.<n>SAFN integrates MRI cortical thickness, MRI volumetric measures, clinical assessments, and demographic variables.<n>It achieves an accuracy of 0.98 plus or minus 0.02 and a PR-AUC of 1.00 plus or minus 0.00, outperforming established machine learning and deep learning baselines.
arXiv Detail & Related papers (2026-01-02T00:51:21Z)
MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology [37.556090746806845]
Multimodal Large Language Models (LLMs) hold promise for biomedical reasoning, but current benchmarks fail to capture the complexity of real-world clinical reasoning.<n>We introduce MTBBench, an agentic benchmark simulating MTB-style decision-making through clinically challenging, multimodal, and longitudinal oncology questions.<n>Ground truth annotations are validated by clinicians via a co-developed app, ensuring clinical relevance.
arXiv Detail & Related papers (2025-11-25T16:56:25Z)
SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction [49.355973075150075]
We introduce SurvAgent, the first hierarchical chain-of-thought (CoT)-enhanced multi-agent system for multimodal survival prediction.<n>SurvAgent consists of two stages: WSI-Gene CoT-Enhanced Case Bank Construction employs hierarchical analysis through Low-Magnification Screening, Cross-Modal Similarity-Aware Patch Mining, and Confidence-Aware Patch Mining for pathology images.<n>Dichotomy-Based Multi-Expert Agent Inference retrieves similar cases via RAG and integrates multimodal reports with expert predictions through progressive interval refinement.
arXiv Detail & Related papers (2025-11-20T18:41:44Z)
Meta-cognitive Multi-scale Hierarchical Reasoning for Motor Imagery Decoding [43.32839547082765]
This work investigates a hierarchical and meta-cognitive decoding framework for four-class electroencephalogram (EEG) signals.<n>We introduce a multi-scale hierarchical signal processing module that reorganizes backbone features into temporal multi-scale representations.<n>We instantiate this framework on three standard EEG backbones and evaluate four-class MI decoding using the BCI Competition IV-2a dataset.
arXiv Detail & Related papers (2025-11-11T06:32:23Z)
Spatially-Aware Mixture of Experts with Log-Logistic Survival Modeling for Whole-Slide Images [6.825656149756289]
We introduce a comprehensive computational pathology framework that addresses limitations through four complementary innovations.<n>Across large TCGA cohorts, our method achieves state-of-the-art performance, yielding time-dependent concordance indices of 0.644 on LUAD, 0.751 on KIRC, and 0.752 on BRCA.<n>The framework further provides improved calibration and interpretability, advancing the use of WSIs for personalized cancer prognosis.
arXiv Detail & Related papers (2025-11-09T08:02:15Z)
Towards a Multimodal MRI-Based Foundation Model for Multi-Level Feature Exploration in Segmentation, Molecular Subtyping, and Grading of Glioma [0.2796197251957244]
Multi-Task S-UNETR (MTSUNET) model is a novel foundation-based framework built on the BrainSegFounder model.<n>It simultaneously performs glioma segmentation, histological subtyping and neuroimaging subtyping.<n>It shows significant potential for advancing noninvasive, personalized glioma management by improving predictive accuracy and interpretability.
arXiv Detail & Related papers (2025-03-10T01:27:09Z)
Enhanced MRI Representation via Cross-series Masking [48.09478307927716]
Cross-Series Masking (CSM) Strategy for effectively learning MRI representation in a self-supervised manner.<n>Method achieves state-of-the-art performance on both public and in-house datasets.
arXiv Detail & Related papers (2024-12-10T10:32:09Z)
multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information [1.7359724605901228]
Brain Tumor distances (BraTS) plays a critical role in clinical diagnosis, treatment planning, and monitoring the progression of brain tumors. Due to the variability in tumor appearance, size, and intensity across different MRI modalities, automated segmentation remains a challenging task. We propose a novel Transformer-based framework, multiPI-TransBTS, which integrates multi-physical information to enhance segmentation accuracy.
arXiv Detail & Related papers (2024-09-18T17:35:19Z)
Multiplex-detection Based Multiple Instance Learning Network for Whole Slide Image Classification [2.61155594652503]
Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology. We propose a novel multiplex-detection-based multiple instance learning (MDMIL) to tackle the issues above. Specifically, MDMIL is constructed by the internal query generation module (IQGM) and the multiplex detection module (MDM)
arXiv Detail & Related papers (2022-08-06T14:36:48Z)
TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication [38.58585442160062]
Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. We proposed a new framework, called correlated MIL, and provided a proof for convergence. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-02T02:57:54Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.