Segment as You Wish -- Free-Form Language-Based Segmentation for Medical Images
- URL: http://arxiv.org/abs/2410.12831v1
- Date: Wed, 02 Oct 2024 16:34:32 GMT
- Title: Segment as You Wish -- Free-Form Language-Based Segmentation for Medical Images
- Authors: Longchao Da, Rui Wang, Xiaojian Xu, Parminder Bhatia, Taha Kass-Hout, Hua Wei, Cao Xiao,
- Abstract summary: We introduce FLanS, a novel medical image segmentation model that handles various free-form text prompts.
FLanS is trained on a large-scale dataset of over 100k medical images from 7 public datasets.
- Score: 30.673958586581904
- License:
- Abstract: Medical imaging is crucial for diagnosing a patient's health condition, and accurate segmentation of these images is essential for isolating regions of interest to ensure precise diagnosis and treatment planning. Existing methods primarily rely on bounding boxes or point-based prompts, while few have explored text-related prompts, despite clinicians often describing their observations and instructions in natural language. To address this gap, we first propose a RAG-based free-form text prompt generator, that leverages the domain corpus to generate diverse and realistic descriptions. Then, we introduce FLanS, a novel medical image segmentation model that handles various free-form text prompts, including professional anatomy-informed queries, anatomy-agnostic position-driven queries, and anatomy-agnostic size-driven queries. Additionally, our model also incorporates a symmetry-aware canonicalization module to ensure consistent, accurate segmentations across varying scan orientations and reduce confusion between the anatomical position of an organ and its appearance in the scan. FLanS is trained on a large-scale dataset of over 100k medical images from 7 public datasets. Comprehensive experiments demonstrate the model's superior language understanding and segmentation precision, along with a deep comprehension of the relationship between them, outperforming SOTA baselines on both in-domain and out-of-domain datasets.
Related papers
- SGSeg: Enabling Text-free Inference in Language-guided Segmentation of Chest X-rays via Self-guidance [10.075820470715374]
We propose a self-guided segmentation framework (SGSeg) that leverages language guidance for training (multi-modal) while enabling text-free inference (uni-modal)
We exploit the critical location information of both pulmonary and pathological structures depicted in the text reports and introduce a novel localization-enhanced report generation (LERG) module to generate clinical reports for self-guidance.
Our LERG integrates an object detector and a location-based attention aggregator, weakly-supervised by a location-aware pseudo-label extraction module.
arXiv Detail & Related papers (2024-09-07T08:16:00Z) - LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation [7.912408164613206]
Medical Image Referring (MIRS) requires segmenting lesions in images based on the given language expressions.
We propose an approach named Language-guided Scale-aware MedSegmentor (LSMS)
Our LSMS consistently outperforms on all datasets with lower computational costs.
arXiv Detail & Related papers (2024-08-30T15:22:13Z) - Scribble-Based Interactive Segmentation of Medical Hyperspectral Images [4.675955891956077]
This work introduces a scribble-based interactive segmentation framework for medical hyperspectral images.
The proposed method utilizes deep learning for feature extraction and a geodesic distance map generated from user-provided scribbles.
arXiv Detail & Related papers (2024-08-05T12:33:07Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - ContourDiff: Unpaired Image-to-Image Translation with Structural Consistency for Medical Imaging [14.487188068402178]
We introduce a novel metric to quantify the structural bias between domains which must be considered for proper translation.
We then propose ContourDiff, a novel image-to-image translation algorithm that leverages domain-invariant anatomical contour representations.
We evaluate our method on challenging lumbar spine and hip-and-thigh CT-to-MRI translation tasks.
arXiv Detail & Related papers (2024-03-16T03:33:52Z) - Anatomical Structure-Guided Medical Vision-Language Pre-training [21.68719061251635]
We propose an Anatomical Structure-Guided (ASG) framework for learning medical visual representations.
For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists.
For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample.
arXiv Detail & Related papers (2024-03-14T11:29:47Z) - Region-based Contrastive Pretraining for Medical Image Retrieval with
Anatomic Query [56.54255735943497]
Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR)
We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR)
arXiv Detail & Related papers (2023-05-09T16:46:33Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.