A Bag of Visual Words Model for Medical Image Retrieval
- URL: http://arxiv.org/abs/2007.09464v1
- Date: Sat, 18 Jul 2020 16:21:30 GMT
- Title: A Bag of Visual Words Model for Medical Image Retrieval
- Authors: Sowmya Kamath S and Karthik K
- Abstract summary: Bag of Visual Words (BoVW) is a technique that can be used to effectively represent intrinsic image features in vector space.
We present a MedIR approach based on the BoVW model for content-based medical image retrieval.
- Score: 0.9137554315375919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical Image Retrieval is a challenging field in Visual information
retrieval, due to the multi-dimensional and multi-modal context of the
underlying content. Traditional models often fail to take the intrinsic
characteristics of data into consideration, and have thus achieved limited
accuracy when applied to medical images. The Bag of Visual Words (BoVW) is a
technique that can be used to effectively represent intrinsic image features in
vector space, so that applications like image classification and similar-image
search can be optimized. In this paper, we present a MedIR approach based on
the BoVW model for content-based medical image retrieval. As medical images as
multi-dimensional, they exhibit underlying cluster and manifold information
which enhances semantic relevance and allows for label uniformity. Hence, the
BoVW features extracted for each image are used to train a supervised machine
learning classifier based on positive and negative training images, for
extending content based image retrieval. During experimental validation, the
proposed model performed very well, achieving a Mean Average Precision of
88.89% during top-3 image retrieval experiments.
Related papers
- Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling.
For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views.
For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z) - Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study [0.6249768559720122]
We benchmark embeddings derived from pre-trained supervised models on medical images against embeddings derived from pre-trained unsupervised models on non-medical images.
For volumetric image retrieval, we adopt a late interaction re-ranking method inspired by text matching.
arXiv Detail & Related papers (2024-05-15T13:34:07Z) - Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images [0.6827423171182154]
We present an approach for identifying near- and duplicate 3D medical images leveraging publicly available 2D computer vision embeddings.
We generate an experimental benchmark based on the publicly available Medical Decathlon dataset.
arXiv Detail & Related papers (2023-12-12T13:52:55Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Multimorbidity Content-Based Medical Image Retrieval Using Proxies [37.47987844057842]
We propose a novel multi-label metric learning method that can be used for both classification and content-based image retrieval.
Our model is able to support diagnosis by predicting the presence of diseases and provide evidence for these predictions.
We demonstrate the efficacy of our approach to both classification and content-based image retrieval on two multimorbidity radiology datasets.
arXiv Detail & Related papers (2022-11-22T11:23:53Z) - Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image
Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures.
We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z) - Semantic segmentation of multispectral photoacoustic images using deep
learning [53.65837038435433]
Photoacoustic imaging has the potential to revolutionise healthcare.
Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information.
We present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images.
arXiv Detail & Related papers (2021-05-20T09:33:55Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Contrastive Learning of Medical Visual Representations from Paired
Images and Text [38.91117443316013]
We propose ConVIRT, an unsupervised strategy to learn medical visual representations by exploiting naturally occurring descriptive paired text.
Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input.
arXiv Detail & Related papers (2020-10-02T02:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.