AI-Assisted Colonoscopy: Polyp Detection and Segmentation using Foundation Models
- URL: http://arxiv.org/abs/2503.24138v1
- Date: Mon, 31 Mar 2025 14:20:53 GMT
- Title: AI-Assisted Colonoscopy: Polyp Detection and Segmentation using Foundation Models
- Authors: Uxue Delaquintana-Aramendi, Leire Benito-del-Valle, Aitor Alvarez-Gila, Javier Pascau, Luisa F Sánchez-Peralta, Artzai Picón, J Blas Pagador, Cristina L Saratxaga,
- Abstract summary: In colonoscopy, 80% of the missed polyps could be detected with the help of Deep Learning models.<n>In the search for algorithms capable of addressing this challenge, foundation models emerge as promising candidates.<n>Their zero-shot or few-shot learning capabilities, facilitate generalization to new data or tasks without extensive fine-tuning.<n>A comprehensive evaluation of foundation models for polyp segmentation was conducted, assessing both detection and delimitation.
- Score: 0.10037949839020764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In colonoscopy, 80% of the missed polyps could be detected with the help of Deep Learning models. In the search for algorithms capable of addressing this challenge, foundation models emerge as promising candidates. Their zero-shot or few-shot learning capabilities, facilitate generalization to new data or tasks without extensive fine-tuning. A concept that is particularly advantageous in the medical imaging domain, where large annotated datasets for traditional training are scarce. In this context, a comprehensive evaluation of foundation models for polyp segmentation was conducted, assessing both detection and delimitation. For the study, three different colonoscopy datasets have been employed to compare the performance of five different foundation models, DINOv2, YOLO-World, GroundingDINO, SAM and MedSAM, against two benchmark networks, YOLOv8 and Mask R-CNN. Results show that the success of foundation models in polyp characterization is highly dependent on domain specialization. For optimal performance in medical applications, domain-specific models are essential, and generic models require fine-tuning to achieve effective results. Through this specialization, foundation models demonstrated superior performance compared to state-of-the-art detection and segmentation models, with some models even excelling in zero-shot evaluation; outperforming fine-tuned models on unseen data.
Related papers
- Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis [4.803310914375717]
This study evaluates three vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks.
The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs.
arXiv Detail & Related papers (2025-04-22T17:20:34Z) - Weakly supervised deep learning model with size constraint for prostate cancer detection in multiparametric MRI and generalization to unseen domains [0.90668179713299]
We show that the model achieves on-par performance with strong fully supervised baseline models.
We also observe a performance decrease for both fully supervised and weakly supervised models when tested on unseen data domains.
arXiv Detail & Related papers (2024-11-04T12:24:33Z) - How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment [11.60167559546617]
Training AI foundation models have emerged as a promising large-scale learning approach for addressing real-world healthcare challenges.
While many of these models have been developed for tasks like disease diagnosis and tissue quantification, their readiness for deployment on some arguably simplest tasks, such as nuclei segmentation within a single organ, remains uncertain.
This paper seeks to answer this key question, "How good are we?" by thoroughly evaluating the performance of recent cell foundation models on a curated dataset.
arXiv Detail & Related papers (2024-10-31T17:00:33Z) - Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies [8.822087602255504]
Applying large language models to the clinical domain is challenging due to the context-heavy nature of processing medical records.
This paper explores how different embedding models and pooling methods affect information retrieval for the clinical domain.
arXiv Detail & Related papers (2024-09-23T16:16:08Z) - Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic
Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging.
We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods.
The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z) - Ensembling Handcrafted Features with Deep Features: An Analytical Study
for Classification of Routine Colon Cancer Histopathological Nuclei Images [13.858624044986815]
We have used F1-measure, Precision, Recall, AUC, and Cross-Entropy Loss to analyse the performance of our approaches.
We observed from the results that the DL features ensemble bring a marked improvement in the overall performance of the model.
arXiv Detail & Related papers (2022-02-22T06:48:50Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.