MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification
- URL: http://arxiv.org/abs/2503.12401v1
- Date: Sun, 16 Mar 2025 08:04:17 GMT
- Title: MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification
- Authors: Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Yang Zhao, Hong Cheng, Huazhu Fu,
- Abstract summary: Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions.<n>We propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification.
- Score: 46.89908887119571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions, which introduce noise and cause data imbalance during feature aggregation. To address these issues, we propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification. MExD balances patch feature distribution through a novel MoE-based aggregator that selectively emphasizes relevant information, effectively filtering noise, addressing data imbalance, and extracting essential features. These features are then integrated via a diffusion-based generative process to directly yield the class distribution for the WSI. Moving beyond conventional discriminative approaches, MExD represents the first generative strategy in WSI classification, capturing fine-grained details for robust and precise results. Our MExD is validated on three widely-used benchmarks-Camelyon16, TCGA-NSCLC, and BRACS consistently achieving state-of-the-art performance in both binary and multi-class tasks.
Related papers
- DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training.
It achieves state-of-the-art performance among diffusion models on ImageNet benchmark.
The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z) - Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization [28.005080560540133]
Weakly supervised temporal action localization (WS-TAL) is a task of targeting at localizing complete action instances and categorizing them with video-level labels.<n> Action-background ambiguity, primarily caused by background noise resulting from aggregation and intra-action variation, is a significant challenge for existing WS-TAL methods.<n>We introduce a hybrid multi-head attention (HMHA) module and generalized uncertainty-based evidential fusion (GUEF) module to address the problem.
arXiv Detail & Related papers (2024-12-27T03:04:57Z) - Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration [7.087475633143941]
MM-Diff is a tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds.
MM-Diff employs a vision encoder to transform the input image into CLS and patch embeddings.
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings.
arXiv Detail & Related papers (2024-03-22T09:32:31Z) - Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions [11.121652649243119]
Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation.
We propose a novel approach termed the detail reinforcement diffusion model(DRDM)
It leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference(SKR)
arXiv Detail & Related papers (2023-09-15T01:28:59Z) - DifFSS: Diffusion Model for Few-Shot Semantic Segmentation [24.497112957831195]
This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS.
DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure.
arXiv Detail & Related papers (2023-07-03T06:33:49Z) - Exploring Multi-Timestep Multi-Stage Diffusion Features for Hyperspectral Image Classification [16.724299091453844]
Diffusion-based HSI classification methods only utilize manually selected single-timestep single-stage features.
We propose a novel diffusion-based feature learning framework that explores Multi-Timestep Multi-Stage Diffusion features for HSI classification for the first time, called MTMSD.
Our method outperforms state-of-the-art methods for HSI classification, especially on the challenging Houston 2018 dataset.
arXiv Detail & Related papers (2023-06-15T08:56:58Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for
Hyperspectral Image Restoration [103.79030498369319]
Self-supervised diffusion model for hyperspectral image restoration is proposed.
textttDDS2M enjoys stronger ability to generalization compared to existing diffusion-based methods.
Experiments on HSI denoising, noisy HSI completion and super-resolution on a variety of HSIs demonstrate textttDDS2M's superiority over the existing task-specific state-of-the-arts.
arXiv Detail & Related papers (2023-03-12T14:57:04Z) - Multiscale Structure Guided Diffusion for Image Deblurring [24.09642909404091]
Diffusion Probabilistic Models (DPMs) have been employed for image deblurring.
We introduce a simple yet effective multiscale structure guidance as an implicit bias.
We demonstrate more robust deblurring results with fewer artifacts on unseen data.
arXiv Detail & Related papers (2022-12-04T10:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.