EndoDINO: A Foundation Model for GI Endoscopy
- URL: http://arxiv.org/abs/2501.05488v1
- Date: Wed, 08 Jan 2025 18:57:05 GMT
- Title: EndoDINO: A Foundation Model for GI Endoscopy
- Authors: Patrick Dermyer, Angad Kalra, Matt Schwartz,
- Abstract summary: We present EndoDINO, a foundation model for GI endoscopy tasks that achieves strong generalizability by pre-training.
We pre-trained ViT models with 1B, 307M, and 86M parameters using datasets ranging from 100K to 10M curated images.
- Score: 0.0
- License:
- Abstract: In this work, we present EndoDINO, a foundation model for GI endoscopy tasks that achieves strong generalizability by pre-training on a well-curated image dataset sampled from the largest known GI endoscopy video dataset in the literature. Specifically, we pre-trained ViT models with 1B, 307M, and 86M parameters using datasets ranging from 100K to 10M curated images. Using EndoDINO as a frozen feature encoder, we achieved state-of-the-art performance in anatomical landmark classification, polyp segmentation, and Mayo endoscopic scoring (MES) for ulcerative colitis with only simple decoder heads.
Related papers
- MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities [59.61465292965639]
This paper investigates a new paradigm for leveraging generative models in medical applications.
We propose a diffusion-based data engine, termed MRGen, which enables generation conditioned on text prompts and masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Integrating Deep Feature Extraction and Hybrid ResNet-DenseNet Model for Multi-Class Abnormality Detection in Endoscopic Images [0.9374652839580183]
The aim is to automate the identification of ten GI abnormality classes, including angioectasia, bleeding, and ulcers.
The proposed model achieves an overall accuracy of 94% across a well-structured dataset.
arXiv Detail & Related papers (2024-10-24T06:10:31Z) - Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy [0.024999074238880488]
Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract.
Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract.
However, its potential is limited by the sheer volume of images generated during the imaging procedure, which can take anywhere from 6-8 hours and often produce up to 1 million images.
arXiv Detail & Related papers (2024-10-21T22:52:25Z) - A novel open-source ultrasound dataset with deep learning benchmarks for
spinal cord injury localization and anatomical segmentation [1.02101998415327]
We present an ultrasound dataset of 10,223-mode (B-mode) images consisting of sagittal slices of porcine spinal cords.
We benchmark the performance metrics of several state-of-the-art object detection algorithms to localize the site of injury.
We evaluate the zero-shot generalization capabilities of the segmentation models on human ultrasound spinal cord images.
arXiv Detail & Related papers (2024-09-24T20:22:59Z) - Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis [36.403320243871526]
We introduce a simplified approach called Adapt foundation models with a low-rank adaptation (LoRA) technique for easier customization.
Unlike traditional fine-tuning methods, our strategy includes LoRA layers designed to absorb specific surgical domain knowledge.
Our solution demonstrates that foundation models can be adeptly adapted for capsule endoscopy diagnosis.
arXiv Detail & Related papers (2024-06-15T05:21:33Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - SAG-GAN: Semi-Supervised Attention-Guided GANs for Data Augmentation on
Medical Images [47.35184075381965]
We present a data augmentation method for generating synthetic medical images using cycle-consistency Generative Adversarial Networks (GANs)
The proposed GANs-based model can generate a tumor image from a normal image, and in turn, it can also generate a normal image from a tumor image.
We train the classification model using real images with classic data augmentation methods and classification models using synthetic images.
arXiv Detail & Related papers (2020-11-15T14:01:24Z) - Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset
in gastrointestinal endoscopy [1.7579113628094125]
Gastrointestinal (GI) pathologies are periodically screened, biopsied, and resected using surgical tools.
This dataset consists of $590$ annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc.
arXiv Detail & Related papers (2020-10-23T18:14:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.