Task-driven Prompt Evolution for Foundation Models
- URL: http://arxiv.org/abs/2310.17128v1
- Date: Thu, 26 Oct 2023 04:08:07 GMT
- Title: Task-driven Prompt Evolution for Foundation Models
- Authors: Rachana Sathish, Rahul Venkataramani, K S Shriram, Prasad Sudhakar
- Abstract summary: We propose a plug-and-play Prompt Optimization Technique for foundation models like SAM (SAMPOT)
We demonstrate the utility of SAMPOT on lung segmentation in chest X-ray images.
- Score: 0.8192907805418581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Promptable foundation models, particularly Segment Anything Model (SAM), have
emerged as a promising alternative to the traditional task-specific supervised
learning for image segmentation. However, many evaluation studies have found
that their performance on medical imaging modalities to be underwhelming
compared to conventional deep learning methods. In the world of large
pre-trained language and vision-language models, learning prompt from
downstream tasks has achieved considerable success in improving performance. In
this work, we propose a plug-and-play Prompt Optimization Technique for
foundation models like SAM (SAMPOT) that utilizes the downstream segmentation
task to optimize the human-provided prompt to obtain improved performance. We
demonstrate the utility of SAMPOT on lung segmentation in chest X-ray images
and obtain an improvement on a significant number of cases ($\sim75\%$) over
human-provided initial prompts. We hope this work will lead to further
investigations in the nascent field of automatic visual prompt-tuning.
Related papers
- Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection [57.666055329221194]
We investigate the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to infrared small object detection tasks.
Our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches.
arXiv Detail & Related papers (2024-09-07T05:31:24Z) - How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model [12.051904886550956]
This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning algorithms across 18 combinations.
We evaluate them on 17 datasets covering all common radiology modalities.
We release our code and MRI-specific fine-tuned weights, which consistently obtained superior performance over the original SAM.
arXiv Detail & Related papers (2024-04-15T17:31:32Z) - Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation.
We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z) - Multi-organ Self-supervised Contrastive Learning for Breast Lesion
Segmentation [0.0]
This paper employs multi-organ datasets for pre-training models tailored to specific organ-related target tasks.
Our target task is breast tumour segmentation in ultrasound images.
Results show that conventional contrastive learning pre-training improves performance compared to supervised baseline approaches.
arXiv Detail & Related papers (2024-02-21T20:29:21Z) - Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning [13.964106147449051]
Existing solutions concentrate on fine-tuning the pre-trained models on conventional image datasets.
We propose a novel and effective framework based on learning Visual Prompts (VPT) in the pre-trained Vision Transformers (ViT)
We demonstrate that our new approximations with semantic information are superior to representative capabilities.
arXiv Detail & Related papers (2024-02-04T04:42:05Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - TransMed: Large Language Models Enhance Vision Transformer for
Biomedical Image Classification [11.202967500669402]
Few-shot learning has been studied to adapt models to tasks with very few samples.
We propose a novel approach that contextualizes labels via large language models (LLMs)
Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories.
arXiv Detail & Related papers (2023-12-12T09:58:07Z) - Self-Prompting Large Vision Models for Few-Shot Medical Image
Segmentation [14.135249795318591]
We propose a novel perspective on self-prompting in medical vision applications.
We harness the embedding space of the Segment Anything Model to prompt itself through a simple yet effective linear pixel-wise classifier.
We achieve competitive results on multiple datasets.
arXiv Detail & Related papers (2023-08-15T08:20:07Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.