Cross-modal ultra-scale learning with tri-modalities of renal biopsy images for glomerular multi-disease auxiliary diagnosis
- URL: http://arxiv.org/abs/2512.15171v1
- Date: Wed, 17 Dec 2025 08:07:42 GMT
- Title: Cross-modal ultra-scale learning with tri-modalities of renal biopsy images for glomerular multi-disease auxiliary diagnosis
- Authors: Kaixing Long, Danyi Weng, Yun Mi, Zhentai Zhang, Yanmeng Lu, Jian Geng, Zhitao Zhou, Liming Zhong, Qianjin Feng, Wei Yang, Lei Cao,
- Abstract summary: We propose a cross-modal ultra-scale learning network (CMUS-Net) for the auxiliary diagnosis of multiple glomerular diseases.<n>CMUS-Net utilizes multiple ultrastructural information to bridge the scale difference between nanometer and micrometer images.<n>Our method follows the conventional process of renal biopsy pathology diagnosis and, for the first time, performs automatic classification of multiple glomerular diseases.
- Score: 11.365912757366372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing a multi-modal automatic classification model based on three types of renal biopsy images can assist pathologists in glomerular multi-disease identification. However, the substantial scale difference between transmission electron microscopy (TEM) image features at the nanoscale and optical microscopy (OM) or immunofluorescence microscopy (IM) images at the microscale poses a challenge for existing multi-modal and multi-scale models in achieving effective feature fusion and improving classification accuracy. To address this issue, we propose a cross-modal ultra-scale learning network (CMUS-Net) for the auxiliary diagnosis of multiple glomerular diseases. CMUS-Net utilizes multiple ultrastructural information to bridge the scale difference between nanometer and micrometer images. Specifically, we introduce a sparse multi-instance learning module to aggregate features from TEM images. Furthermore, we design a cross-modal scale attention module to facilitate feature interaction, enhancing pathological semantic information. Finally, multiple loss functions are combined, allowing the model to weigh the importance among different modalities and achieve precise classification of glomerular diseases. Our method follows the conventional process of renal biopsy pathology diagnosis and, for the first time, performs automatic classification of multiple glomerular diseases including IgA nephropathy (IgAN), membranous nephropathy (MN), and lupus nephritis (LN) based on images from three modalities and two scales. On an in-house dataset, CMUS-Net achieves an ACC of 95.37+/-2.41%, an AUC of 99.05+/-0.53%, and an F1-score of 95.32+/-2.41%. Extensive experiments demonstrate that CMUS-Net outperforms other well-known multi-modal or multi-scale methods and show its generalization capability in staging MN. Code is available at https://github.com/SMU-GL-Group/MultiModal_lkx/tree/main.
Related papers
- Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification [9.266365198478741]
We propose the Multimodal Multiscale Cross-Attention Fusion Network (MMCAF-Net)<n>This model employs a feature pyramid structure combined with an efficient 3D multi-scale convolutional attention module to extract lesion-specific features from 3D medical images.<n>The results showed a significant improvement in diagnostic accuracy, surpassing current state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T08:37:06Z) - MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.<n>MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.<n>We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - MultiFusionNet: Multilayer Multimodal Fusion of Deep Neural Networks for
Chest X-Ray Image Classification [16.479941416339265]
Automated systems utilizing convolutional neural networks (CNNs) have shown promise in improving the accuracy and efficiency of chest X-ray image classification.
We propose a novel deep learning-based multilayer multimodal fusion model that emphasizes extracting features from different layers and fusing them.
The proposed model achieves a significantly higher accuracy of 97.21% and 99.60% for both three-class and two-class classifications, respectively.
arXiv Detail & Related papers (2024-01-01T11:50:01Z) - Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - Style transfer between Microscopy and Magnetic Resonance Imaging via Generative Adversarial Network in small sample size settings [45.62331048595689]
Cross-modal augmentation of Magnetic Resonance Imaging (MRI) and microscopic imaging based on the same tissue samples is promising.<n>We tested a method for generating microscopic histological images from MRI scans of the human corpus callosum using conditional generative adversarial network (cGAN) architecture.
arXiv Detail & Related papers (2023-10-16T13:58:53Z) - Multi-scale Multi-site Renal Microvascular Structures Segmentation for
Whole Slide Imaging in Renal Pathology [4.743463035587953]
We present Omni-Seg, a novel single dynamic network method that capitalizes on multi-site, multi-scale training data.
We train a singular deep network using images from two datasets, HuBMAP and NEPTUNE.
Our proposed method provides renal pathologists with a powerful computational tool for the quantitative analysis of renal microvascular structures.
arXiv Detail & Related papers (2023-08-10T16:26:03Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation [66.89632406480949]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.<n>Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - Multi-Modal Multi-Instance Learning for Retinal Disease Recognition [10.294738095942812]
We aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case.
As both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight.
arXiv Detail & Related papers (2021-09-25T08:16:47Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - MSA-MIL: A deep residual multiple instance learning model based on
multi-scale annotation for classification and visualization of glomerular
spikes [9.432314992011099]
In the biopsy microscope slide of membranous nephropathy, spikelike projections on the glomerular basement membrane is a prominent feature of the MN.
In this paper, we establish a visualized classification model based on the multi-scale annotation multi-instance learning (MSA-MIL) to achieve glomerular classification and spikes visualization.
arXiv Detail & Related papers (2020-07-02T03:49:44Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.