Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy
- URL: http://arxiv.org/abs/2410.21302v4
- Date: Wed, 11 Dec 2024 10:58:10 GMT
- Title: Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy
- Authors: Marcel Roth, Micha V. Nowak, Adrian Krenzer, Frank Puppe,
- Abstract summary: Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract.
Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract.
However, its potential is limited by the sheer volume of images generated during the imaging procedure, which can take anywhere from 6-8 hours and often produce up to 1 million images.
- Score: 0.024999074238880488
- License:
- Abstract: Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract, enabling early disease detection. However, its potential is limited by the sheer volume of images generated during the imaging procedure, which can take anywhere from 6-8 hours and often produce up to 1 million images, necessitating automated analysis. Additionally, the variability of these images, combined with the need for expert annotations and the scarcity of large, high-quality labeled datasets, constrains the effectiveness of current medical image analysis models. To address this, we introduce a novel large GIE dataset, called EndoExtend24, created by merging ten existing public and private datasets, ensuring patient integrity across splits. EndoExtend24 includes over 226,000 labeled images, as well as dynamic class mappings, which allow unified training across datasets with differing labeling granularity, supporting up to 123 distinct pathological findings. Further, we propose to leverage domain adaptive pre-training of foundation models trained with self-supervision on generic image data, to adapt them to the task of GIE medical image diagnosis. Specifically, the EVA-02 model, which is based on the ViT architecture and trained on ImageNet-22k with masked image modeling (using EVA-CLIP as a MIM teacher), is pre-trained on the EndoExtend24 dataset to achieve domain adaptation, and finally trained on the Capsule Endoscopy 2024 Challenge dataset. Our model demonstrates robust performance, securing third place in the Capsule Endoscopy 2024 Challenge. We achieved a macro AUC of 0.762 and a balanced accuracy of 37.1% on the test set. These results emphasize the effectiveness of our domain-adaptive pre-training approach and the enriched EndoExtend24 dataset in advancing gastrointestinal endoscopy diagnostics.
Related papers
- EndoDINO: A Foundation Model for GI Endoscopy [0.0]
We present EndoDINO, a foundation model for GI endoscopy tasks that achieves strong generalizability by pre-training.
We pre-trained ViT models with 1B, 307M, and 86M parameters using datasets ranging from 100K to 10M curated images.
arXiv Detail & Related papers (2025-01-08T18:57:05Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image
Enhancement for Gastrointestinal Visual Question Answering [0.0]
The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain.
multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture.
Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score.
arXiv Detail & Related papers (2023-07-06T05:22:20Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - An Ensemble Method to Automatically Grade Diabetic Retinopathy with
Optical Coherence Tomography Angiography Images [4.640835690336653]
We propose an ensemble method to automatically grade Diabetic retinopathy (DR) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022.
First, we adopt the state-of-the-art classification networks, and train them to grade UW- OCTA images with different splits of the available dataset.
Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions.
arXiv Detail & Related papers (2022-12-12T22:06:47Z) - Optimising Chest X-Rays for Image Analysis by Identifying and Removing
Confounding Factors [49.005337470305584]
During the COVID-19 pandemic, the sheer volume of imaging performed in an emergency setting for COVID-19 diagnosis has resulted in a wide variability of clinical CXR acquisitions.
The variable quality of clinically-acquired CXRs within publicly available datasets could have a profound effect on algorithm performance.
We propose a simple and effective step-wise approach to pre-processing a COVID-19 chest X-ray dataset to remove undesired biases.
arXiv Detail & Related papers (2022-08-22T13:57:04Z) - A Novel Automated Classification and Segmentation for COVID-19 using 3D
CT Scans [5.5957919486531935]
In COVID-19 computed tomography (CT) images of the lungs, ground glass turbidity is the most common finding that requires specialist diagnosis.
Some researchers propose the relevant DL models which can replace professional diagnostic specialists in clinics when lacking expertise.
Our model achieves 94.52% accuracy in the classification of lung lesions by 3 types: COVID, Pneumonia and Normal.
arXiv Detail & Related papers (2022-08-04T22:14:18Z) - FetReg2021: A Challenge on Placental Vessel Segmentation and
Registration in Fetoscopy [52.3219875147181]
Fetoscopic laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS)
The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination.
Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking.
Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fet
arXiv Detail & Related papers (2022-06-24T23:44:42Z) - Learning from Pseudo Lesion: A Self-supervised Framework for COVID-19
Diagnosis [22.54540093657541]
The Coronavirus disease 2019 (COVID-19) has rapidly spread all over the world since its first report in December 2019.
In recent years, deep learning-based approaches have shown impressive performance in myriad image recognition tasks.
We proposed in this paper a novel self-supervised pretraining method based on pseudo lesions generation and restoration for COVID-19 diagnosis.
arXiv Detail & Related papers (2021-06-23T11:21:30Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - Residual Attention U-Net for Automated Multi-Class Segmentation of
COVID-19 Chest CT Images [46.844349956057776]
coronavirus disease 2019 (COVID-19) has been spreading rapidly around the world and caused significant impact on the public health and economy.
There is still lack of studies on effectively quantifying the lung infection caused by COVID-19.
We propose a novel deep learning algorithm for automated segmentation of multiple COVID-19 infection regions.
arXiv Detail & Related papers (2020-04-12T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.