Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval
- URL: http://arxiv.org/abs/2409.09430v2
- Date: Wed, 26 Mar 2025 19:11:03 GMT
- Title: Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval
- Authors: Amirreza Mahbod, Nematollah Saeidi, Sepideh Hatamikia, Ramona Woitek,
- Abstract summary: Content-based medical image retrieval (CBMIR) depends on image features, which can be extracted automatically or semi-automatically.<n>In this study, we used several pre-trained feature extractors from well-known pre-trained convolutional neural networks (CNNs) and pre-trained foundation models.<n>Our results show that, overall, for the 2D datasets, foundation models deliver superior performance by a large margin compared to CNNs.<n>Our findings confirm that while using larger image sizes (especially for 2D datasets) yields slightly better performance, competitive CBMIR performance can still be achieved even with smaller image
- Score: 0.37478492878307323
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical image retrieval refers to the task of finding similar images for given query images in a database, with applications such as diagnosis support. While traditional medical image retrieval relied on clinical metadata, content-based medical image retrieval (CBMIR) depends on image features, which can be extracted automatically or semi-automatically. Many approaches have been proposed for CBMIR, and among them, using pre-trained convolutional neural networks (CNNs) is a widely utilized approach. However, considering the recent advances in the development of foundation models for various computer vision tasks, their application for CBMIR can also be investigated. In this study, we used several pre-trained feature extractors from well-known pre-trained CNNs and pre-trained foundation models and investigated the CBMIR performance on eight types of two-dimensional (2D) and three-dimensional (3D) medical images. Furthermore, we investigated the effect of image size on the CBMIR performance. Our results show that, overall, for the 2D datasets, foundation models deliver superior performance by a large margin compared to CNNs, with the general-purpose self-supervised model for computational pathology (UNI) providing the best overall performance across all datasets and image sizes. For 3D datasets, CNNs and foundation models deliver more competitive performance, with contrastive learning from captions for histopathology model (CONCH) achieving the best overall performance. Moreover, our findings confirm that while using larger image sizes (especially for 2D datasets) yields slightly better performance, competitive CBMIR performance can still be achieved even with smaller image sizes. Our codes to reproduce the results are available at: https://github.com/masih4/MedImageRetrieval.
Related papers
- Machine-learning for photoplethysmography analysis: Benchmarking feature, image, and signal-based approaches [1.1011387049911827]
Photoplethysmography is a widely used non-invasive physiological sensing technique, suitable for various clinical applications.
Machine learning methods are increasingly supported by machine learning methods, raising the question of the most appropriate input representation and model choice.
We address this gap in the research landscape by a comprehensive benchmarking study covering three kinds of input representations, interpretable features, image representations and raw waveforms.
arXiv Detail & Related papers (2025-02-27T10:17:16Z) - Triad: Vision Foundation Model for 3D Magnetic Resonance Imaging [3.7942449131350413]
We propose Triad, a vision foundation model for 3D MRI.
Triad adopts a widely used autoencoder architecture to learn robust representations from 131,170 3D MRI volumes.
We evaluate Triad across three tasks, namely, organ/tumor segmentation, organ/cancer classification, and medical image registration.
arXiv Detail & Related papers (2025-02-19T19:31:52Z) - Embeddings are all you need! Achieving High Performance Medical Image Classification through Training-Free Embedding Analysis [0.0]
Developing artificial intelligence (AI) and machine learning (ML) models for medical imaging typically involves extensive training and testing on large datasets.
We investigated the feasibility of replacing conventional training procedures with an embedding-based approach.
arXiv Detail & Related papers (2024-12-12T16:59:37Z) - Deep Convolutional Neural Networks on Multiclass Classification of Three-Dimensional Brain Images for Parkinson's Disease Stage Prediction [2.931680194227131]
We developed a model capable of accurately predicting Parkinson's disease stages.
We used the entire three-dimensional (3D) brain images as input.
We incorporated an attention mechanism to account for the varying importance of different slices in the prediction process.
arXiv Detail & Related papers (2024-10-31T05:40:08Z) - Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities [0.0]
This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets.
It shows that the use of pretrained models as fixed feature extractors yields poor performance irrespective of the datasets.
It is also found that deeper and more complex architectures did not necessarily result in the best performance.
arXiv Detail & Related papers (2024-08-30T04:51:19Z) - Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning [49.197385954021456]
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for visualization and subsequent analysis tasks.
To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.
Most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios.
arXiv Detail & Related papers (2024-06-10T02:20:26Z) - Boosting Medical Image Segmentation Performance with Adaptive Convolution Layer [6.887244952811574]
We propose an adaptive layer placed ahead of leading deep-learning models such as UCTransNet.
Our approach enhances the network's ability to handle diverse anatomical structures and subtle image details.
It consistently outperforms traditional CNNs with fixed kernel sizes with a similar number of parameters.
arXiv Detail & Related papers (2024-04-17T13:18:39Z) - Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and
DINOv2 in Medical Imaging Classification [7.205610366609243]
In this paper, we performed a glioma grading task using three clinical modalities of brain MRI data.
We compared the performance of various pre-trained deep learning models, including those based on ImageNet and DINOv2.
Our findings indicate that in our clinical dataset, DINOv2's performance was not as strong as ImageNet-based pre-trained models.
arXiv Detail & Related papers (2024-02-12T11:49:08Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Enhanced Transfer Learning Through Medical Imaging and Patient
Demographic Data Fusion [0.0]
We examine the performance enhancement in classification of medical imaging data when image features are combined with associated non-image data.
We utilise transfer learning with networks pretrained on ImageNet used directly as feature extractors and fine tuned on the target domain.
arXiv Detail & Related papers (2021-11-29T09:11:52Z) - Colorectal Polyp Classification from White-light Colonoscopy Images via
Domain Alignment [57.419727894848485]
A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images.
Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images.
We propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification.
arXiv Detail & Related papers (2021-08-05T09:31:46Z) - Two layer Ensemble of Deep Learning Models for Medical Image
Segmentation [0.2699900017799093]
We propose a two-layer ensemble of deep learning models for the segmentation of medical images.
The prediction for each training image pixel made by each model in the first layer is used as the augmented data of the training image.
The prediction of the second layer is then combined by using a weights-based scheme in which each model contributes differently to the combined result.
arXiv Detail & Related papers (2021-04-10T16:52:34Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.