Related papers: A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification

A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification

URL: http://arxiv.org/abs/2602.00214v1
Date: Fri, 30 Jan 2026 15:21:31 GMT
Title: A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification
Authors: Juan A. Olmos, Antoine Manzanera, Fabio Martínez,
Abstract summary: Prostate cancer (PCa) is one of the most common cancers in men worldwide.<n>Most existing computer-aided diagnosis methods focus on imaging-based models.<n>We propose a multimodal geometric Foundation Model (FM) that learns representations from bp-MRI and clinical reports.
Score: 6.053648545114842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prostate cancer (PCa) is one of the most common cancers in men worldwide. Bi-parametric MRI (bp-MRI) and clinical variables are crucial for PCa identification and improving treatment decisions. However, this process is subjective to expert interpretations. Furthermore, most existing computer-aided diagnosis methods focus on imaging-based models, overlooking the clinical context and suffering from data scarcity, limiting their ability to learn robust representations. We propose a geometric multimodal Foundation Model (FM), named MFM-Geom, that learns representations from bp-MRI and clinical reports, encoding visual findings and information from the context of clinical variables. In the representations classification head, the approach leverages symmetric positive definite (SPD) matrices and Riemannian deep learning to integrate imaging-text representations from a biomedical multimodal FM. Using 10% of the training data, MFM-Geom outperformed baseline class token embedding-based classification (+8.3%, AUC-PR of 90.67). Generalization on external dataset confirmed the robustness of fine-tuning biomedical FM, achieving an AUC-PR of 90.6.

Related papers

PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification [30.58342408480846]
PathMoE is an interpretable multimodal framework that integrates H&E slides, pathology reports, and nuclei-level cell graphs.<n>We evaluate our framework on two dataset-specific classification tasks on an internal pediatric brain tumor dataset and external TCGA datasets.
arXiv Detail & Related papers (2026-03-02T07:17:44Z)
brat: Aligned Multi-View Embeddings for Brain MRI Analysis [36.795218160666266]
brat is a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports.<n>Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume.
arXiv Detail & Related papers (2025-12-21T10:37:31Z)
Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting [10.376219551996792]
Mammo-FM is the first foundation model specifically for mammography, pretrained on the largest and most diverse dataset to date.<n>Mammo-FM provides a unified foundation for core clinical tasks in breast imaging, including cancer diagnosis, pathology localization, structured report generation, and cancer risk prognosis within a single framework.
arXiv Detail & Related papers (2025-11-28T20:41:14Z)
Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z)
Fusion-Based Brain Tumor Classification Using Deep Learning and Explainable AI, and Rule-Based Reasoning [0.0]
This study presents an ensemble-based deep learning framework that combines MobileNetV2 and DenseNet121 convolutional neural networks (CNNs)<n>The models were trained and evaluated on the Figshare dataset using a stratified 5-fold cross-validation protocol.<n>The ensemble achieved superior performance compared to individual CNNs, with an accuracy of 91.7%, precision of 91.9%, recall of 91.7%, and F1-score of 91.6%.
arXiv Detail & Related papers (2025-08-09T08:46:36Z)
Sensing Cardiac Health Across Scenarios and Devices: A Multi-Modal Foundation Model Pretrained on Heterogeneous Data from 1.7 Million Individuals [36.08910150609342]
We present a cardiac sensing foundation model (CSFM) that learns unified representations from vast, heterogeneous health records.<n>Our model is pretrained on an innovative multi-modal integration of data from multiple large-scale datasets.<n> CSFM consistently outperforms traditional one-modal-one-task approaches.
arXiv Detail & Related papers (2025-06-23T20:58:12Z)
BRISC: Annotated Dataset for Brain Tumor Segmentation and Classification [0.6840587119863303]
We introduce BRISC, a dataset designed for brain tumor segmentation and classification tasks, featuring high-resolution segmentation masks.<n>The dataset comprises 6,000 contrast-enhanced T1-weighted MRI scans, which were collated from multiple public datasets that lacked segmentation labels.<n>It includes three major tumor types, namely glioma, meningioma, and pituitary, as well as non-tumorous cases.
arXiv Detail & Related papers (2025-06-17T08:56:05Z)
Enhanced MRI Representation via Cross-series Masking [48.09478307927716]
Cross-Series Masking (CSM) Strategy for effectively learning MRI representation in a self-supervised manner.<n>Method achieves state-of-the-art performance on both public and in-house datasets.
arXiv Detail & Related papers (2024-12-10T10:32:09Z)
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.<n>This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications.
arXiv Detail & Related papers (2024-03-20T05:50:04Z)
PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction [4.659998272408215]
Early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. We suggest a multimodal fusion methodology, termed PE-MVCNet, which capitalizes on Computed Tomography Pulmonary Angiography imaging and EMR data. Our proposed model outperforms existing methodologies, corroborating that our multimodal fusion model excels compared to models that use a single data modality.
arXiv Detail & Related papers (2024-02-27T03:53:27Z)
CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images [42.771819949806655]
We introduce CIMIL-CRC', a framework that solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches. We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort.
arXiv Detail & Related papers (2024-01-29T12:56:11Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
Classification of lung cancer subtypes on CT images with synthetic pathological priors [41.75054301525535]
Cross-scale associations exist in the image patterns between the same case's CT images and its pathological images. We propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on CT images.
arXiv Detail & Related papers (2023-08-09T02:04:05Z)
Domain Transfer Through Image-to-Image Translation for Uncertainty-Aware Prostate Cancer Classification [42.75911994044675]
We present a novel approach for unpaired image-to-image translation of prostate MRIs and an uncertainty-aware training approach for classifying clinically significant PCa. Our approach involves a novel pipeline for translating unpaired 3.0T multi-parametric prostate MRIs to 1.5T, thereby augmenting the available training data. Our experiments demonstrate that the proposed method significantly improves the Area Under ROC Curve (AUC) by over 20% compared to the previous work.
arXiv Detail & Related papers (2023-07-02T05:26:54Z)
Explaining Clinical Decision Support Systems in Medical Imaging using Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest. clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend. We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.