Related papers: Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

URL: http://arxiv.org/abs/2505.09435v1
Date: Wed, 14 May 2025 14:43:31 GMT
Title: Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Authors: Yili He, Yan Zhu, Peiyao Fu, Ruijie Yang, Tianyi Chen, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang,
Abstract summary: Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis.<n>We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain.<n>In experiments, Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification.
Score: 25.683273197557934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions. We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain. Endo-CLIP's three-stage framework--cleansing, attunement, and unification--addresses these challenges by (1) removing background frames, (2) leveraging large language models to extract clinical attributes for fine-grained contrastive learning, and (3) employing patient-level cross-attention to resolve multi-polyp ambiguities. Extensive experiments demonstrate that Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification, paving the way for more accurate and clinically relevant endoscopic analysis.

Related papers

MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment [12.665019147690975]
MAKE is a vision-language pretraining framework for zero-shot dermatological tasks.<n>It decomposes clinical narratives into knowledge-enhanced sub-texts.<n>It prioritizes different sub-captions based on clinical significance prior.
arXiv Detail & Related papers (2025-05-14T13:24:08Z)
OCL: Ordinal Contrastive Learning for Imputating Features with Progressive Labels [4.434835769977399]
We introduce a holistic imaging feature imputation method that enables to leverage diverse imaging features while retaining all subjects.<n>The proposed method promotes our holistic imaging feature imputation across various modalities in the shared embedding space.<n>In the experiments, we show that our networks deliver favorable results for statistical analysis and classification against imputation baselines.
arXiv Detail & Related papers (2025-03-03T07:23:29Z)
Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models [32.17651741681871]
We propose a Progressive Spectrum Diffusion Model (PSDM) for generating synthetic polyp images.<n>PSDM integrates diverse clinical annotations-such as segmentation masks, bounding boxes, and colonoscopy reports-by transforming them into compositional prompts.<n>By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation.
arXiv Detail & Related papers (2025-02-25T08:22:45Z)
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM [41.398287899966995]
Current AI-assisted skin image diagnosis has achieved dermatologist-level performance in classifying skin cancer.<n>We propose a novel Cross-Attentive Fusion framework for interpretable skin lesion diagnosis.
arXiv Detail & Related papers (2024-09-14T20:11:25Z)
CLIP in Medical Imaging: A Survey [59.429714742927956]
Contrastive Language-Image Pre-training successfully introduces text supervision to vision models.<n>The use of CLIP has recently gained increasing interest in the medical imaging domain.
arXiv Detail & Related papers (2023-12-12T15:21:57Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment [57.419727894848485]
A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images. Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images. We propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification.
arXiv Detail & Related papers (2021-08-05T09:31:46Z)
Malignancy Prediction and Lesion Identification from Clinical Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images. We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
Explaining Clinical Decision Support Systems in Medical Imaging using Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest. clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend. We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.