DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification
- URL: http://arxiv.org/abs/2505.23341v2
- Date: Fri, 27 Jun 2025 07:34:36 GMT
- Title: DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification
- Authors: Daoxi Cao, Hangbei Cheng, Yijin Li, Ruolin Zhou, Xuehan Zhang, Xinyi Li, Binwei Li, Xuancheng Gu, Jianan Zhang, Xueyu Liu, Yongfei Wu,
- Abstract summary: Whole-slide images (WSIs) are critical for cancer diagnosis due to their ultra-high resolution and rich semantic content.<n>We propose DSAGL (Dual-Stream Attention-Guided Learning), a novel weakly supervised classification framework that combines a teacher-student architecture with a dual-stream design.
- Score: 5.260725801393189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole-slide images (WSIs) are critical for cancer diagnosis due to their ultra-high resolution and rich semantic content. However, their massive size and the limited availability of fine-grained annotations pose substantial challenges for conventional supervised learning. We propose DSAGL (Dual-Stream Attention-Guided Learning), a novel weakly supervised classification framework that combines a teacher-student architecture with a dual-stream design. DSAGL explicitly addresses instance-level ambiguity and bag-level semantic consistency by generating multi-scale attention-based pseudo labels and guiding instance-level learning. A shared lightweight encoder (VSSMamba) enables efficient long-range dependency modeling, while a fusion-attentive module (FASA) enhances focus on sparse but diagnostically relevant regions. We further introduce a hybrid loss to enforce mutual consistency between the two streams. Experiments on CIFAR-10, NCT-CRC, and TCGA-Lung datasets demonstrate that DSAGL consistently outperforms state-of-the-art MIL baselines, achieving superior discriminative performance and robustness under weak supervision.
Related papers
- DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
Open-Vocabulary Multi-Label Recognition (OV-MLR) aims to identify multiple seen and unseen object categories within an image.<n> Vision-Language Pre-training models offer a strong open-vocabulary foundation, but struggle with fine-grained localization under weak supervision.<n>We propose the Dual Adaptive Refinement Transfer (DART) framework to overcome these limitations.
arXiv Detail & Related papers (2025-08-07T17:22:33Z) - Dual Guidance Semi-Supervised Action Detection [71.45023660211145]
We present a semi-supervised approach for spatial-temporal action localization.<n>We introduce a dual guidance network to select better pseudo-bounding boxes.<n>Our framework achieves superior results compared to extended image-based semi-supervised baselines.
arXiv Detail & Related papers (2025-07-28T18:08:36Z) - MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing [10.207026975603503]
We introduce MoSAiC, a unified framework that jointly optimize intra- and inter-modality contrastive learning with a multi-label supervised contrastive loss.<n>MoSAiC consistently outperforms both fully supervised and self-supervised baselines in terms of accuracy, cluster coherence, and generalization.
arXiv Detail & Related papers (2025-07-11T15:33:51Z) - Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-Light Semantic Segmentation [32.4542162399975]
Weakly-supervised semantic segmentation aims to assign category labels to each pixel using weak annotations.<n>Existing methods significantly degrade in low-light environments due to severe image quality degradation and inherent constraints of weak supervision.<n>We propose Diffusion-Guided Knowledge Distillation for WeaklySupervised Low-light Semantic (DGKD-WLSS)
arXiv Detail & Related papers (2025-07-10T09:28:54Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation [2.964206587462833]
A novel semi-supervised segmentation framework, called HDC, is proposed incorporating adaptive consistency learning with a single-teacher architecture.<n>The framework introduces a hierarchical distillation mechanism with two objectives: Correlation Guidance Loss for aligning feature representations and Mutual Information Loss for stabilizing noisy student learning.
arXiv Detail & Related papers (2025-04-14T04:52:24Z) - Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.<n>MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.<n>MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization [2.1682783789464968]
Fine-grained Action Recognition (FAR) focuses on detailed semantic labels within shorter temporal duration.<n>Given the high costs of annotating labels and the substantial data needed for fine-tuning LLMs, we propose to adopt semi-supervised learning (SSL)<n>Our framework, SeFAR, incorporates several innovative designs to tackle these challenges.
arXiv Detail & Related papers (2025-01-02T13:12:12Z) - Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems [2.0179223501624786]
This paper presents TCVADS (Two-stage Cross-modal Video Anomaly Detection System), which leverages knowledge distillation and cross-modal contrastive learning.<n> Experimental results demonstrate that TCVADS significantly outperforms existing methods in model performance, detection efficiency, and interpretability.
arXiv Detail & Related papers (2024-12-28T16:24:35Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - Intrapartum Ultrasound Image Segmentation of Pubic Symphysis and Fetal Head Using Dual Student-Teacher Framework with CNN-ViT Collaborative Learning [1.5233179662962222]
The segmentation of the pubic symphysis and fetal head (PSFH) constitutes a pivotal step in monitoring labor progression and identifying potential delivery complications.
Traditional semi-supervised learning approaches primarily utilize a unified network model based on Convolutional Neural Networks (CNNs)
We introduce a novel framework, the Dual-Student and Teacher Combining CNN and Transformer (DSTCT)
arXiv Detail & Related papers (2024-09-11T00:57:31Z) - Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification [59.99976102069976]
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data.<n>Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning.<n>This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories.
arXiv Detail & Related papers (2024-03-13T05:48:58Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Deep Attention-guided Graph Clustering with Dual Self-supervision [49.040136530379094]
We propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC)
We develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss.
Our method consistently outperforms state-of-the-art methods on six benchmark datasets.
arXiv Detail & Related papers (2021-11-10T06:53:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.