Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
- URL: http://arxiv.org/abs/2505.09435v1
- Date: Wed, 14 May 2025 14:43:31 GMT
- Title: Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
- Authors: Yili He, Yan Zhu, Peiyao Fu, Ruijie Yang, Tianyi Chen, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang,
- Abstract summary: Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis.<n>We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain.<n>In experiments, Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification.
- Score: 25.683273197557934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions. We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain. Endo-CLIP's three-stage framework--cleansing, attunement, and unification--addresses these challenges by (1) removing background frames, (2) leveraging large language models to extract clinical attributes for fine-grained contrastive learning, and (3) employing patient-level cross-attention to resolve multi-polyp ambiguities. Extensive experiments demonstrate that Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification, paving the way for more accurate and clinically relevant endoscopic analysis.
Related papers
- MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment [12.665019147690975]
MAKE is a vision-language pretraining framework for zero-shot dermatological tasks.<n>It decomposes clinical narratives into knowledge-enhanced sub-texts.<n>It prioritizes different sub-captions based on clinical significance prior.
arXiv Detail & Related papers (2025-05-14T13:24:08Z) - OCL: Ordinal Contrastive Learning for Imputating Features with Progressive Labels [4.434835769977399]
We introduce a holistic imaging feature imputation method that enables to leverage diverse imaging features while retaining all subjects.<n>The proposed method promotes our holistic imaging feature imputation across various modalities in the shared embedding space.<n>In the experiments, we show that our networks deliver favorable results for statistical analysis and classification against imputation baselines.
arXiv Detail & Related papers (2025-03-03T07:23:29Z) - Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models [32.17651741681871]
We propose a Progressive Spectrum Diffusion Model (PSDM) for generating synthetic polyp images.<n>PSDM integrates diverse clinical annotations-such as segmentation masks, bounding boxes, and colonoscopy reports-by transforming them into compositional prompts.<n>By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation.
arXiv Detail & Related papers (2025-02-25T08:22:45Z) - Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM [41.398287899966995]
Current AI-assisted skin image diagnosis has achieved dermatologist-level performance in classifying skin cancer.<n>We propose a novel Cross-Attentive Fusion framework for interpretable skin lesion diagnosis.
arXiv Detail & Related papers (2024-09-14T20:11:25Z) - CLIP in Medical Imaging: A Survey [59.429714742927956]
Contrastive Language-Image Pre-training successfully introduces text supervision to vision models.<n>The use of CLIP has recently gained increasing interest in the medical imaging domain.
arXiv Detail & Related papers (2023-12-12T15:21:57Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Colorectal Polyp Classification from White-light Colonoscopy Images via
Domain Alignment [57.419727894848485]
A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images.
Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images.
We propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification.
arXiv Detail & Related papers (2021-08-05T09:31:46Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - Explaining Clinical Decision Support Systems in Medical Imaging using
Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest.
clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend.
We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.