Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis
- URL: http://arxiv.org/abs/2406.10508v2
- Date: Sun, 30 Jun 2024 14:28:34 GMT
- Title: Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis
- Authors: Bowen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan, Jianhua Zhang, Hongliang Ren,
- Abstract summary: We introduce a simplified approach called Adapt foundation models with a low-rank adaptation (LoRA) technique for easier customization.
Unlike traditional fine-tuning methods, our strategy includes LoRA layers designed to absorb specific surgical domain knowledge.
Our solution demonstrates that foundation models can be adeptly adapted for capsule endoscopy diagnosis.
- Score: 36.403320243871526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model for specific tasks. In this work, we introduce a simplified approach called Adapt foundation models with a low-rank adaptation (LoRA) technique for easier customization. Our method, inspired by the DINOv2 foundation model, applies low-rank adaptation learning to tailor foundation models for capsule endoscopy diagnosis effectively. Unlike traditional fine-tuning methods, our strategy includes LoRA layers designed to absorb specific surgical domain knowledge. During the training process, we keep the main model (the backbone encoder) fixed and focus on optimizing the LoRA layers and the disease classification component. We tested our method on two publicly available datasets for capsule endoscopy disease classification. The results were impressive, with our model achieving 97.75% accuracy on the Kvasir-Capsule dataset and 98.81% on the Kvasirv2 dataset. Our solution demonstrates that foundation models can be adeptly adapted for capsule endoscopy diagnosis, highlighting that mere reliance on straightforward fine-tuning or pre-trained models from general computer vision tasks is inadequate for such specific applications.
Related papers
- RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports [19.915033191502328]
The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing.
To handle this issue, a CLIP-style retinal image foundation model is developed in this paper.
Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs.
arXiv Detail & Related papers (2024-05-23T03:20:51Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Meta Transfer of Self-Supervised Knowledge: Foundation Model in Action
for Post-Traumatic Epilepsy Prediction [0.6291443816903801]
We introduce a novel training strategy for our foundation model.
We demonstrate that the proposed strategy significantly improves task performance on small-scale clinical datasets.
Results further demonstrated the enhanced generalizability of our foundation model.
arXiv Detail & Related papers (2023-12-21T07:42:49Z) - MedFMC: A Real-world Dataset and Benchmark For Foundation Model
Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications.
Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples.
Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic
Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging.
We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods.
The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z) - Predicting Visual Improvement after Macular Hole Surgery: a Cautionary
Tale on Deep Learning with Very Limited Data [0.0]
We investigate the potential of machine learning models for the prediction of visual improvement after macular hole surgery from preoperative data.
We end up with only 121 total samples, putting our work in the very limited data regime.
We find that all tested deep vision models are outperformed by a simple regression model on the clinical features.
arXiv Detail & Related papers (2021-09-20T12:23:04Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Self-Adaptive Transfer Learning for Multicenter Glaucoma Classification
in Fundus Retina Images [9.826586293806837]
We propose a self-adaptive transfer learning (SATL) strategy to fill the domain gap between multicenter datasets.
Specifically, the encoder of a DL model that is pre-trained on the source domain is used to initialize the encoder of a reconstruction model.
Results demonstrate that the proposed SATL strategy is effective in the domain adaptation task between a private and two public glaucoma diagnosis datasets.
arXiv Detail & Related papers (2021-05-07T05:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.