Related papers: Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy

Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy

URL: http://arxiv.org/abs/2601.09209v1
Date: Wed, 14 Jan 2026 06:24:18 GMT
Title: Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy
Authors: Qiang Hu, Qimei Wang, Yingjie Guo, Qiang Li, Zhiwei Wang,
Abstract summary: PaGKD operates at the group level to distill more complete and compatible knowledge across modalities.<n>Experiments on four clinical datasets demonstrate that PaGKD consistently and significantly outperforms state-of-the-art methods.
Score: 9.859796200559805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: White-Light Imaging (WLI) is the standard for endoscopic cancer screening, but Narrow-Band Imaging (NBI) offers superior diagnostic details. A key challenge is transferring knowledge from NBI to enhance WLI-only models, yet existing methods are critically hampered by their reliance on paired NBI-WLI images of the same lesion, a costly and often impractical requirement that leaves vast amounts of clinical data untapped. In this paper, we break this paradigm by introducing PaGKD, a novel Pairing-free Group-level Knowledge Distillation framework that that enables effective cross-modal learning using unpaired WLI and NBI data. Instead of forcing alignment between individual, often semantically mismatched image instances, PaGKD operates at the group level to distill more complete and compatible knowledge across modalities. Central to PaGKD are two complementary modules: (1) Group-level Prototype Distillation (GKD-Pro) distills compact group representations by extracting modality-invariant semantic prototypes via shared lesion-aware queries; (2) Group-level Dense Distillation (GKD-Den) performs dense cross-modal alignment by guiding group-aware attention with activation-derived relation maps. Together, these modules enforce global semantic consistency and local structural coherence without requiring image-level correspondence. Extensive experiments on four clinical datasets demonstrate that PaGKD consistently and significantly outperforms state-of-the-art methods, achieving relative AUC improvements of 3.3%, 1.1%, 2.8%, and 3.2%, respectively, establishing a new direction for cross-modal learning from unpaired data.

Related papers

Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
FedBiCross: A Bi-Level Optimization Framework to Tackle Non-IID Challenges in Data-Free One-Shot Federated Learning on Medical Data [17.89045564472333]
FedBiCross is a personalized OSFL framework with three stages: (1) clustering clients by model output similarity to form coherent sub-ensembles, (2) bi-level cross-cluster optimization, and (3) personalized distillation for client-specific adaptation.<n>Experiments on four medical image datasets demonstrate that FedBiCross consistently outperforms state-of-the-art baselines across different non-IID degrees.
arXiv Detail & Related papers (2026-01-05T08:46:11Z)
Unified Supervision For Vision-Language Modeling in 3D Computed Tomography [1.4193731654133002]
General-purpose vision-language models (VLMs) have emerged as promising tools in radiology, offering zero-shot capabilities.<n>In high-stakes domains like diagnostic radiology, these models often lack the discriminative precision required for reliable clinical use.<n>We introduce Uniferum, a volumetric VLM that unifies diverse supervision signals, encoded in classification labels and segmentation masks, into a single training framework.
arXiv Detail & Related papers (2025-09-01T15:30:17Z)
Holistic White-light Polyp Classification via Alignment-free Dense Distillation of Auxiliary Optical Chromoendoscopy [14.917217801444794]
This paper proposes a novel holistic classification framework that leverages full-image diagnosis without requiring polyp localization.<n>The key innovation lies in the Alignment-free Dense Distillation (ADD) module, which enables fine-grained cross-domain knowledge distillation.<n>Our method achieves state-of-the-art performance, relatively outperforming the other approaches by at least 2.5% and 16.2% in AUC.
arXiv Detail & Related papers (2025-05-25T21:09:58Z)
Bridged Semantic Alignment for Zero-shot 3D Medical Image Diagnosis [23.56751925900571]
3D medical images such as Computed tomography (CT) are widely used in clinical practice, offering a great potential for automatic diagnosis.<n>Supervised learning-based approaches have achieved significant progress but rely heavily on extensive manual annotations.<n> Vision-language alignment (VLA) offers a promising alternative by enabling zero-shot learning without additional annotations.
arXiv Detail & Related papers (2025-01-07T06:30:52Z)
Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation [44.54301473673582]
Semi-supervised learning (SSL) has achieved notable progress in medical image segmentation. Recent developments in visual foundation models, such as the Segment Anything Model (SAM), have demonstrated remarkable adaptability. We propose a cross-prompting consistency method with segment anything model (CPC-SAM) for semi-supervised medical image segmentation.
arXiv Detail & Related papers (2024-07-07T15:43:20Z)
Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation [14.536384387956527]
We develop a novel Multi-Scale Cross Supervised Contrastive Learning framework to segment structures in medical images. Our approach contrasts multi-scale features based on ground-truth and cross-predicted labels, in order to extract robust feature representations. It outperforms state-of-the-art semi-supervised methods by more than 3.0% in Dice.
arXiv Detail & Related papers (2023-06-25T16:55:32Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
Hepatic vessel segmentation based on 3Dswin-transformer with inductive biased multi-head self-attention [46.46365941681487]
We propose a robust end-to-end vessel segmentation network called Indu BIased Multi-Head Attention Vessel Net. We introduce the voxel-wise embedding rather than patch-wise embedding to locate precise liver vessel voxels. On the other hand, we propose inductive biased multi-head self-attention which learns inductive biased relative positional embedding from absolute position embedding.
arXiv Detail & Related papers (2021-11-05T10:17:08Z)
Cross-Modality Brain Tumor Segmentation via Bidirectional Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme. Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor. The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z)
Dual-Consistency Semi-Supervised Learning with Uncertainty Quantification for COVID-19 Lesion Segmentation from CT Images [49.1861463923357]
We propose an uncertainty-guided dual-consistency learning network (UDC-Net) for semi-supervised COVID-19 lesion segmentation from CT images. Our proposed UDC-Net improves the fully supervised method by 6.3% in Dice and outperforms other competitive semi-supervised approaches by significant margins.
arXiv Detail & Related papers (2021-04-07T16:23:35Z)
Malignancy Prediction and Lesion Identification from Clinical Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images. We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe) We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling. CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.