One-shot Localization and Segmentation of Medical Images with Foundation
Models
- URL: http://arxiv.org/abs/2310.18642v1
- Date: Sat, 28 Oct 2023 08:58:20 GMT
- Title: One-shot Localization and Segmentation of Medical Images with Foundation
Models
- Authors: Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag,
Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh
Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout
- Abstract summary: We show that the models trained on natural images can offer good performance on medical images.
We leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation.
We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg.
- Score: 7.9060536840474365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models
with their ability to capture rich semantic features of the image have been
used for image correspondence tasks on natural images. In this paper, we
examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP)
and SD models, trained exclusively on natural images, for solving the
correspondence problems on medical images. While many works have made a case
for in-domain training, we show that the models trained on natural images can
offer good performance on medical images across different modalities
(CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical
regions (brain, thorax, abdomen, extremities), and on wide variety of tasks.
Further, we leverage the correspondence with respect to a template image to
prompt a Segment Anything (SAM) model to arrive at single shot segmentation,
achieving dice range of 62%-90% across tasks, using just one image as
reference. We also show that our single-shot method outperforms the recently
proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most
of the semantic segmentation tasks(six out of seven) across medical imaging
modalities.
Related papers
- TransMed: Large Language Models Enhance Vision Transformer for
Biomedical Image Classification [11.202967500669402]
Few-shot learning has been studied to adapt models to tasks with very few samples.
We propose a novel approach that contextualizes labels via large language models (LLMs)
Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories.
arXiv Detail & Related papers (2023-12-12T09:58:07Z) - Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image
Segmentation [10.946806607643689]
The Segment Anything Model (SAM) is a powerful foundation model that introduced revolutionary advancements in natural image segmentation.
In this study, we introduce a novel fine-tuning framework that leverages SAM's ability to bundle and process multiple prompts per image.
arXiv Detail & Related papers (2023-10-03T19:05:00Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - Towards Segment Anything Model (SAM) for Medical Image Segmentation: A
Survey [8.76496233192512]
We discuss efforts to extend the success of the Segment Anything Model to medical image segmentation tasks.
Many insights are drawn to guide future research to develop foundation models for medical image analysis.
arXiv Detail & Related papers (2023-05-05T16:48:45Z) - Zero-shot performance of the Segment Anything Model (SAM) in 2D medical
imaging: A comprehensive evaluation and practical guidelines [0.13854111346209866]
Segment Anything Model (SAM) harnesses a massive training dataset to segment nearly any object.
Our findings reveal that SAM's zero-shot performance is not only comparable, but in certain cases, surpasses the current state-of-the-art.
We propose practical guidelines that require minimal interaction while consistently yielding robust outcomes.
arXiv Detail & Related papers (2023-04-28T22:07:24Z) - Generalist Vision Foundation Models for Medical Imaging: A Case Study of
Segment Anything Model on Zero-Shot Medical Segmentation [5.547422331445511]
We report quantitative and qualitative zero-shot segmentation results on nine medical image segmentation benchmarks.
Our study indicates the versatility of generalist vision foundation models on medical imaging.
arXiv Detail & Related papers (2023-04-25T08:07:59Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Universal Model for Multi-Domain Medical Image Retrieval [88.67940265012638]
Medical Image Retrieval (MIR) helps doctors quickly find similar patients' data.
MIR is becoming increasingly helpful due to the wide use of digital imaging modalities.
However, the popularity of various digital imaging modalities in hospitals also poses several challenges to MIR.
arXiv Detail & Related papers (2020-07-14T23:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.