Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection
- URL: http://arxiv.org/abs/2407.11188v1
- Date: Mon, 15 Jul 2024 19:22:32 GMT
- Title: Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection
- Authors: Chenwei Wu, David Restrepo, Zitao Shuai, Zhongming Liu, Liyue Shen,
- Abstract summary: In this work, we propose a label-efficient in-context medical segmentation method by introducing a novel Meta-driven Visual Prompt Selection mechanism (MVPS)
MVPS is a flexible, finetuning-free module that could be easily plugged into different backbones and combined with other model-centric approaches.
- Score: 5.4498959901128226
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In-context learning (ICL) with Large Vision Models (LVMs) presents a promising avenue in medical image segmentation by reducing the reliance on extensive labeling. However, the ICL performance of LVMs highly depends on the choices of visual prompts and suffers from domain shifts. While existing works leveraging LVMs for medical tasks have focused mainly on model-centric approaches like fine-tuning, we study an orthogonal data-centric perspective on how to select good visual prompts to facilitate generalization to medical domain. In this work, we propose a label-efficient in-context medical segmentation method by introducing a novel Meta-driven Visual Prompt Selection mechanism (MVPS), where a prompt retriever obtained from a meta-learning framework actively selects the optimal images as prompts to promote model performance and generalizability. Evaluated on 8 datasets and 4 tasks across 3 medical imaging modalities, our proposed approach demonstrates consistent gains over existing methods under different scenarios, improving both computational and label efficiency. Finally, we show that MVPS is a flexible, finetuning-free module that could be easily plugged into different backbones and combined with other model-centric approaches.
Related papers
- LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training [5.819704618007536]
A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts.
We propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques.
arXiv Detail & Related papers (2024-05-30T04:04:36Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z) - UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic
Cross-modal Learnable Prompts [14.681493967465693]
We propose UniDCP, a Unified medical vision-language model with Dynamic Cross-modal learnable Prompts.
UniDCP is capable of performing all 8 medical uni-modal and cross-modal tasks over 14 corresponding datasets.
arXiv Detail & Related papers (2023-12-18T13:18:24Z) - ScribbleVC: Scribble-supervised Medical Image Segmentation with
Vision-Class Embedding [5.425414924685109]
ScribbleVC is a novel framework for scribble-supervised medical image segmentation.
The proposed method combines a scribble-based approach with a segmentation network and a class-embedding module to produce accurate segmentation masks.
We evaluate ScribbleVC on three benchmark datasets and compare it with state-of-the-art methods.
arXiv Detail & Related papers (2023-07-30T13:38:52Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Toward Unpaired Multi-modal Medical Image Segmentation via Learning
Structured Semantic Consistency [24.78258331561847]
This paper presents a novel scheme to learn the mutual benefits of different modalities to achieve better segmentation results for unpaired medical images.
We leverage a carefully designed External Attention Module (EAM) to align semantic class representations and their correlations of different modalities.
We have demonstrated the effectiveness of the proposed method on two medical image segmentation scenarios.
arXiv Detail & Related papers (2022-06-21T17:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.