Related papers: Task-driven Prompt Evolution for Foundation Models

Task-driven Prompt Evolution for Foundation Models

URL: http://arxiv.org/abs/2310.17128v1
Date: Thu, 26 Oct 2023 04:08:07 GMT
Title: Task-driven Prompt Evolution for Foundation Models
Authors: Rachana Sathish, Rahul Venkataramani, K S Shriram, Prasad Sudhakar
Abstract summary: We propose a plug-and-play Prompt Optimization Technique for foundation models like SAM (SAMPOT) We demonstrate the utility of SAMPOT on lung segmentation in chest X-ray images.
Score: 0.8192907805418581
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Promptable foundation models, particularly Segment Anything Model (SAM), have emerged as a promising alternative to the traditional task-specific supervised learning for image segmentation. However, many evaluation studies have found that their performance on medical imaging modalities to be underwhelming compared to conventional deep learning methods. In the world of large pre-trained language and vision-language models, learning prompt from downstream tasks has achieved considerable success in improving performance. In this work, we propose a plug-and-play Prompt Optimization Technique for foundation models like SAM (SAMPOT) that utilizes the downstream segmentation task to optimize the human-provided prompt to obtain improved performance. We demonstrate the utility of SAMPOT on lung segmentation in chest X-ray images and obtain an improvement on a significant number of cases ($\sim75\%$) over human-provided initial prompts. We hope this work will lead to further investigations in the nascent field of automatic visual prompt-tuning.

Related papers

AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models [7.382887784956608]
This paper introduces a zero-shot and automatic segmentation pipeline that combines vision-language and segmentation foundation models.<n>By proper decomposition and test-time adaptation, our fully automatic pipeline performs competitively with weakly-prompted interactive foundation models.
arXiv Detail & Related papers (2025-05-23T14:07:21Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors. Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images. Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation [30.524999223901645]
We propose an enhanced Segment Anything Model (SAM) framework that utilizes annotation-efficient prompts generated in a fully unsupervised fashion. We adopt the direct preference optimization technique to design an optimal policy that enables the model to generate high-fidelity segmentations. State-of-the-art performance of our framework in tasks such as lung segmentation, breast tumor segmentation, and organ segmentation across various modalities, including X-ray, ultrasound, and abdominal CT, justifies its effectiveness in low-annotation data scenarios.
arXiv Detail & Related papers (2025-03-06T17:28:48Z)
Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations [1.8142185304787555]
We propose a novel test-time training paradigm that enhances the performance of foundation models on downstream datasets without requiring full annotations. Specifically, our method employs simple point prompts to guide a test-time semi-self-supervised training task. This approach directly tackles challenges in the medical imaging field, where acquiring annotations is both time-intensive and expensive.
arXiv Detail & Related papers (2025-01-30T16:48:02Z)
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models. Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework. Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z)
Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection [57.666055329221194]
We investigate the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to infrared small object detection tasks. Our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches.
arXiv Detail & Related papers (2024-09-07T05:31:24Z)
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model [12.051904886550956]
This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning algorithms across 18 combinations. We evaluate them on 17 datasets covering all common radiology modalities. We release our code and MRI-specific fine-tuned weights, which consistently obtained superior performance over the original SAM.
arXiv Detail & Related papers (2024-04-15T17:31:32Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation. We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
Multi-organ Self-supervised Contrastive Learning for Breast Lesion Segmentation [0.0]
This paper employs multi-organ datasets for pre-training models tailored to specific organ-related target tasks. Our target task is breast tumour segmentation in ultrasound images. Results show that conventional contrastive learning pre-training improves performance compared to supervised baseline approaches.
arXiv Detail & Related papers (2024-02-21T20:29:21Z)
Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning [13.964106147449051]
Existing solutions concentrate on fine-tuning the pre-trained models on conventional image datasets. We propose a novel and effective framework based on learning Visual Prompts (VPT) in the pre-trained Vision Transformers (ViT) We demonstrate that our new approximations with semantic information are superior to representative capabilities.
arXiv Detail & Related papers (2024-02-04T04:42:05Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification [11.202967500669402]
Few-shot learning has been studied to adapt models to tasks with very few samples. We propose a novel approach that contextualizes labels via large language models (LLMs) Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories.
arXiv Detail & Related papers (2023-12-12T09:58:07Z)
Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation [14.135249795318591]
We propose a novel perspective on self-prompting in medical vision applications. We harness the embedding space of the Segment Anything Model to prompt itself through a simple yet effective linear pixel-wise classifier. We achieve competitive results on multiple datasets.
arXiv Detail & Related papers (2023-08-15T08:20:07Z)
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.