Related papers: LIMIS: Towards Language-based Interactive Medical Image Segmentation

LIMIS: Towards Language-based Interactive Medical Image Segmentation

URL: http://arxiv.org/abs/2410.16939v1
Date: Tue, 22 Oct 2024 12:13:47 GMT
Title: LIMIS: Towards Language-based Interactive Medical Image Segmentation
Authors: Lena Heinemann, Alexander Jaus, Zdravko Marinov, Moon Kim, Maria Francesca Spadea, Jens Kleesiek, Rainer Stiefelhagen,
Abstract summary: LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
Score: 58.553786162527686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Within this work, we introduce LIMIS: The first purely language-based interactive medical image segmentation model. We achieve this by adapting Grounded SAM to the medical domain and designing a language-based model interaction strategy that allows radiologists to incorporate their knowledge into the segmentation process. LIMIS produces high-quality initial segmentation masks by leveraging medical foundation models and allows users to adapt segmentation masks using only language, opening up interactive segmentation to scenarios where physicians require using their hands for other tasks. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability with experts from the medical domain confirming its high-quality segmentation masks and its interactive usability.

Related papers

Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging [4.341503087761129]
Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming. Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues.
arXiv Detail & Related papers (2025-04-09T23:33:35Z)
SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation [13.037264314135033]
We propose the SEmantic-Guided SAM (SEG-SAM), a unified medical segmentation model that incorporates semantic medical knowledge. First, to avoid the potential conflict between binary and semantic predictions, we introduce a semantic-aware decoder independent of SAM's original decoder. We solicit key characteristics of medical categories from large language models and incorporate them into SEG-SAM through a text-to-vision semantic module. In the end, we introduce the cross-mask spatial alignment strategy to encourage greater overlap between the predicted masks from SEG-SAM's two decoders.
arXiv Detail & Related papers (2024-12-17T08:29:13Z)
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data. This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation [7.912408164613206]
Medical Image Referring (MIRS) requires segmenting lesions in images based on the given language expressions. We propose an approach named Language-guided Scale-aware MedSegmentor (LSMS) Our LSMS consistently outperforms on all datasets with lower computational costs.
arXiv Detail & Related papers (2024-08-30T15:22:13Z)
Residual-based Language Models are Free Boosters for Biomedical Imaging [15.154015369984572]
In this study, we uncover the unexpected efficacy of residual-based large language models (LLMs) as part of encoders for biomedical imaging tasks. We found that these LLMs could boost performance across a spectrum of biomedical imaging applications, including both 2D and 3D visual classification tasks. As a byproduct, we found that the proposed framework achieved superior performance, setting new state-of-the-art results on extensive, standardized datasets in MedMNIST-2D and 3D.
arXiv Detail & Related papers (2024-03-26T03:05:20Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images. Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z)
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey [8.76496233192512]
We discuss efforts to extend the success of the Segment Anything Model to medical image segmentation tasks. Many insights are drawn to guide future research to develop foundation models for medical image analysis.
arXiv Detail & Related papers (2023-05-05T16:48:45Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process. By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z)
Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities. We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.