Related papers: PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification

PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification

URL: http://arxiv.org/abs/2603.01547v1
Date: Mon, 02 Mar 2026 07:17:44 GMT
Title: PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification
Authors: Jian Yu, Joakim Nguyen, Jinrui Fang, Awais Naeem, Zeyuan Cao, Sanjay Krishnan, Nicholas Konz, Tianlong Chen, Chandra Krishnan, Hairong Wang, Edward Castillo, Ying Ding, Ankita Shukla,
Abstract summary: PathMoE is an interpretable multimodal framework that integrates H&E slides, pathology reports, and nuclei-level cell graphs.<n>We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset and external TCGA datasets.
Score: 30.58342408480846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate classification of pediatric central nervous system tumors remains challenging due to histological complexity and limited training data. While pathology foundation models have advanced whole-slide image (WSI) analysis, they often fail to leverage the rich, complementary information found in clinical text and tissue microarchitecture. To this end, we propose PathMoE, an interpretable multimodal framework that integrates H\&E slides, pathology reports, and nuclei-level cell graphs via an interaction-aware mixture-of-experts architecture built on state-of-the-art foundation models for each modality. By training specialized experts to capture modality uniqueness, redundancy, and synergy, PathMoE employs an input-dependent gating mechanism that dynamically weights these interactions, providing sample-level interpretability. We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset (PBT) and external TCGA datasets. PathMoE improves macro-F1 from 0.762 to 0.799 (+0.037) on PBT when integrating WSI, text, and graph modalities; on TCGA, augmenting WSI with graph knowledge improves macro-F1 from 0.668 to 0.709 (+0.041). These results demonstrate significant performance gains over state-of-the-art image-only baselines while revealing the specific modality interactions driving individual predictions. This interpretability is particularly critical for rare tumor subtypes, where transparent model reasoning is essential for clinical trust and diagnostic validation.

Related papers

A multimodal vision foundation model for generalizable knee pathology [40.03838145472935]
Musculoskeletal disorders represent an urgent demand for precise interpretation of medical imaging.<n>Current artificial intelligence approaches in orthopedics rely on task-specific, supervised learning paradigms.<n>We introduce OrthoFoundation, a multimodal vision foundation model optimized for musculoskeletal pathology.
arXiv Detail & Related papers (2026-01-26T08:14:51Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models [45.285970665585914]
We propose a comprehensive framework for Continual Learning.<n>We employ a multi-modal, multi-layer RAG system that provides real-time guidance for model fine-tuning.<n>We introduce a dynamic knowledge distillation framework.
arXiv Detail & Related papers (2025-12-15T08:09:40Z)
Morphology-Aware KOA Classification: Integrating Graph Priors with Vision Models [13.437469558862084]
We propose a novel framework that combines anatomical structure with radiographic features.<n>Our approach enforces alignment between geometry-informed graph embeddings and radiographic features.<n> Experiments on the Osteoarthritis Initiative dataset demonstrate that our approach surpasses single-modality baselines by up to 10% in accuracy.
arXiv Detail & Related papers (2025-10-20T17:20:19Z)
A Novel Multi-branch ConvNeXt Architecture for Identifying Subtle Pathological Features in CT Scans [1.2461503242570642]
This paper introduces a novel multi-branch ConvNeXt architecture designed specifically for the nuanced challenges of medical image analysis.<n>The proposed model incorporates a rigorous end-to-end pipeline, from meticulous data preprocessing to augmentation to a disciplined two-phase training strategy.<n> Experimental results demonstrate a superior performance on the validation set, achieving a final ROC-AUC of 0.9937, a validation accuracy of 0.9757, and an F1-score of 0.9825 for COVID-19 cases.
arXiv Detail & Related papers (2025-10-10T08:00:46Z)
A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion [5.15423063632115]
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection.<n>Existing AI approaches fall short by focusing on single view inputs or single-task outputs.<n>We propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views.
arXiv Detail & Related papers (2025-07-22T18:52:18Z)
Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z)
Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology [6.418265127069878]
We propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions.<n>This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.
arXiv Detail & Related papers (2024-11-26T13:25:53Z)
UNICORN: A Deep Learning Model for Integrating Multi-Stain Data in Histopathology [2.9389205138207277]
UNICORN is a multi-modal transformer capable of processing multi-stain histopathology for atherosclerosis severity class prediction. The architecture comprises a two-stage, end-to-end trainable model with specialized modules utilizing transformer self-attention blocks. UNICORN achieved a classification accuracy of 0.67, outperforming other state-of-the-art models.
arXiv Detail & Related papers (2024-09-26T12:13:52Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.