SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues
- URL: http://arxiv.org/abs/2406.19364v3
- Date: Wed, 25 Sep 2024 12:03:54 GMT
- Title: SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues
- Authors: Yuxin Xie, Tao Zhou, Yi Zhou, Geng Chen,
- Abstract summary: We present a novel framework, SimTxtSeg, that leverages simple text cues to generate high-quality pseudo-labels.
We evaluate our framework on two medical image segmentation tasks: colonic polyp segmentation and MRI brain tumor segmentation.
- Score: 11.856041847833666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised medical image segmentation is a challenging task that aims to reduce the annotation cost while keep the segmentation performance. In this paper, we present a novel framework, SimTxtSeg, that leverages simple text cues to generate high-quality pseudo-labels and study the cross-modal fusion in training segmentation models, simultaneously. Our contribution consists of two key components: an effective Textual-to-Visual Cue Converter that produces visual prompts from text prompts on medical images, and a text-guided segmentation model with Text-Vision Hybrid Attention that fuses text and image features. We evaluate our framework on two medical image segmentation tasks: colonic polyp segmentation and MRI brain tumor segmentation, and achieve consistent state-of-the-art performance. Source code is available at: https://github.com/xyx1024/SimTxtSeg.
Related papers
- Text-promptable Propagation for Referring Medical Image Sequence Segmentation [18.633874947279168]
Referring Medical Image Sequence aims to segment the referred anatomical entities corresponding to medical text prompts.
TPP supports the segmentation of arbitrary objects of interest based on cross-modal prompt fusion.
We curate a large and comprehensive benchmark covering 4 modalities and 20 different organs and lesions.
arXiv Detail & Related papers (2025-02-16T12:13:11Z) - Language-guided Medical Image Segmentation with Target-informed Multi-level Contrastive Alignments [13.94586574102162]
We propose a language-guided segmentation network with Target-informed Multi-level Contrastive Alignments (TMCA)
TMCA enables target-informed cross-modality alignments and fine-grained text guidance to bridge the pattern gaps in language-guided segmentation.
arXiv Detail & Related papers (2024-12-18T06:19:03Z) - Text4Seg: Reimagining Image Segmentation as Text Generation [32.230379277018194]
We introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem.
Key innovation is semantic descriptors, a new textual representation of segmentation masks where each image patch is mapped to its corresponding text label.
We show that Text4Seg achieves state-of-the-art performance on multiple datasets by fine-tuning different MLLM backbones.
arXiv Detail & Related papers (2024-10-13T14:28:16Z) - Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation [28.24883865053459]
This paper aims to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations.
Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts.
A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments.
arXiv Detail & Related papers (2024-04-05T17:25:17Z) - Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z) - Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts [63.84720380390935]
There exist two typical types, textiti.e., the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used.
We propose an effective yet straightforward scheme named PTUnifier to unify the two types.
We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts.
arXiv Detail & Related papers (2023-02-17T15:43:42Z) - ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View
Semantic Consistency [126.88107868670767]
We propose multi-textbfView textbfConsistent learning (ViewCo) for text-supervised semantic segmentation.
We first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.
We also propose cross-view segmentation consistency modeling to address the ambiguity issue of text supervision.
arXiv Detail & Related papers (2023-01-31T01:57:52Z) - Self-Supervised Correction Learning for Semi-Supervised Biomedical Image
Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation.
We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting.
Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.