Related papers: Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification

Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification

URL: http://arxiv.org/abs/2508.03277v1
Date: Tue, 05 Aug 2025 09:56:12 GMT
Title: Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification
Authors: Hang Guo, Qing Zhang, Zixuan Gao, Siyuan Yang, Shulin Peng, Xiang Tao, Ting Yu, Yan Wang, Qingli Li,
Abstract summary: We propose an Efficient multimodal framework for Patient-level placental disease Diagnosis, named EmmPD.<n>Our approach introduces a two-stage patch selection module that combines parameter-free and learnable compression strategies.<n>We develop a hybrid multimodal fusion module that leverages adaptive graph learning to enhance pathological feature representation.
Score: 20.137166016134636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate prediction of placental diseases via whole slide images (WSIs) is critical for preventing severe maternal and fetal complications. However, WSI analysis presents significant computational challenges due to the massive data volume. Existing WSI classification methods encounter critical limitations: (1) inadequate patch selection strategies that either compromise performance or fail to sufficiently reduce computational demands, and (2) the loss of global histological context resulting from patch-level processing approaches. To address these challenges, we propose an Efficient multimodal framework for Patient-level placental disease Diagnosis, named EmmPD. Our approach introduces a two-stage patch selection module that combines parameter-free and learnable compression strategies, optimally balancing computational efficiency with critical feature preservation. Additionally, we develop a hybrid multimodal fusion module that leverages adaptive graph learning to enhance pathological feature representation and incorporates textual medical reports to enrich global contextual understanding. Extensive experiments conducted on both a self-constructed patient-level Placental dataset and two public datasets demonstrating that our method achieves state-of-the-art diagnostic performance. The code is available at https://github.com/ECNU-MultiDimLab/EmmPD.

Related papers

Decentralized LoRA Augmented Transformer with Context-aware Multi-scale Feature Learning for Secured Eye Diagnosis [2.1358421658740214]
This paper proposes a novel Data efficient Image Transformer (DeiT) based framework that integrates context aware multiscale patch embedding, Low-Rank Adaptation (LoRA), knowledge distillation, and federated learning to address these challenges in a unified manner.<n>The proposed model effectively captures both local and global retinal features by leveraging multi scale patch representations with local and global attention mechanisms.
arXiv Detail & Related papers (2025-05-11T13:51:56Z)
TUMLS: Trustful Fully Unsupervised Multi-Level Segmentation for Whole Slide Images of Histology [41.94295877935867]
We present a trustful fully unsupervised multi-level segmentation methodology (TUMLS) for whole slide images (WSIs)<n>TUMLS adopts an autoencoder (AE) as a feature extractor to identify the different tissue types within low-resolution training data.<n>This solution integrates seamlessly into clinicians, transforming the examination of a whole WSI into a review of concise, interpretable cross-level insights.
arXiv Detail & Related papers (2025-04-17T07:48:05Z)
Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment [0.8458496687170665]
The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools.<n>This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio, and video data.<n>We explore data-level, feature-level, and decision-level fusion techniques, including a novel integration of Large Language Model predictions.
arXiv Detail & Related papers (2025-04-02T14:19:06Z)
Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation [30.697291934309206]
multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy.<n>Traditional deep learning methods typically address these issues by learning representations in latent space.<n>Authors propose the Essence-Point and Disentangle Representation Learning (EDRL) strategy, which integrates a self-distillation mechanism into an end-to-end framework.
arXiv Detail & Related papers (2025-03-07T10:58:38Z)
Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation [53.723234136550055]
We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML)<n>We propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it.<n>Our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data.
arXiv Detail & Related papers (2025-03-07T07:22:42Z)
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification [4.148491257542209]
Few-shot learning presents a critical solution for cancer diagnosis in computational pathology.<n>A key challenge in this paradigm stems from the inherent disparity between the limited training set of whole slide images (WSIs) and the enormous number of contained patches.<n>We introduce the knowledge-enhanced adaptive visual compression framework, dubbed FOCUS, to enable a focused analysis of diagnostically relevant regions.
arXiv Detail & Related papers (2024-11-22T05:36:38Z)
HyperMM : Robust Multimodal Learning with Varying-sized Inputs [4.377889826841039]
HyperMM is an end-to-end framework designed for learning with varying-sized inputs. We introduce a novel strategy for training a universal feature extractor using a conditional hypernetwork. We experimentally demonstrate the advantages of our method in two tasks: Alzheimer's disease detection and breast cancer classification.
arXiv Detail & Related papers (2024-07-30T12:13:18Z)
Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models [48.87160158792048]
We introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods.
arXiv Detail & Related papers (2024-05-26T10:58:22Z)
Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning [67.49221252724229]
E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. Applying federated learning in e-health faces many challenges. Medical data is both horizontally and vertically partitioned. A naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies.
arXiv Detail & Related papers (2024-04-15T19:45:07Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
Cross-Site Severity Assessment of COVID-19 from CT Images via Domain Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event. To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites. This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z)
Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.