Related papers: FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

URL: http://arxiv.org/abs/2403.08059v3
Date: Wed, 25 Jun 2025 16:40:39 GMT
Title: FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation
Authors: Benjamin D. Killeen, Liam J. Wang, Blanca Inigo, Han Zhang, Mehran Armand, Russell H. Taylor, Greg Osgood, Mathias Unberath,
Abstract summary: FluoroSAM is a language-promptable variant of the Segment Anything Model.<n>It is capable of segmenting myriad anatomical structures and tools based on natural language prompts.<n>We show how FluoroSAM is a key enabler for rich human-machine interaction in the X-ray image acquisition and analysis context.
Score: 11.55858990545478
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Language promptable X-ray image segmentation would enable greater flexibility for human-in-the-loop workflows in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving problems within a narrow scope, but expanding to broader use requires additional data, annotations, and training time. Recently, language-aligned foundation models (LFMs) -- machine learning models trained on large amounts of highly variable image and text data thus enabling broad applicability -- have emerged as promising tools for automated image analysis. Existing foundation models for medical image analysis focus on scenarios and modalities where large, richly annotated datasets are available. However, the X-ray imaging modality features highly variable image appearance and applications, from diagnostic chest X-rays to interventional fluoroscopy, with varying availability of data. To pave the way toward an LFM for comprehensive and language-aligned analysis of arbitrary medical X-ray images, we introduce FluoroSAM, a language-promptable variant of the Segment Anything Model, trained from scratch on 3M synthetic X-ray images from a wide variety of human anatomies, imaging geometries, and viewing angles. These include pseudo-ground truth masks for 128 organ types and 464 tools with associated text descriptions. FluoroSAM is capable of segmenting myriad anatomical structures and tools based on natural language prompts, thanks to the novel incorporation of vector quantization (VQ) of text embeddings in the training process. We demonstrate FluoroSAM's performance quantitatively on real X-ray images and showcase on several applications how FluoroSAM is a key enabler for rich human-machine interaction in the X-ray image acquisition and analysis context. Code is available at https://github.com/arcadelab/fluorosam.

Related papers

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model [0.8670827427401335]
We propose a novel view-conditioned model for multi-view X-ray images from a single view.<n>Our approach leverages the Diffusion Transformer to preserve fine details and employs a weak-to-strong training strategy for stable high-resolution image generation.<n> Experimental results demonstrate that our method generates higher-resolution outputs with improved control over viewing angles.
arXiv Detail & Related papers (2025-07-07T15:58:11Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors. Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images. Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data. This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Multiplex Imaging Analysis in Pathology: a Comprehensive Review on Analytical Approaches and Digital Toolkits [0.7968706282619793]
Multi multiplexed imaging allows for simultaneous visualization of multiple biomarkers in a single section. Data from multiplexed imaging requires sophisticated computational methods for preprocessing, segmentation, feature extraction, and spatial analysis. PathML is an AI-powered platform that streamlines image analysis, making complex interpretation accessible for clinical and research settings.
arXiv Detail & Related papers (2024-11-01T18:02:41Z)
VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis [9.937830036053871]
VoxelPrompt tackles diverse radiological tasks through joint modeling of natural language, image volumes, and analytical metrics. We show that VoxelPrompt can delineate hundreds of anatomical and pathological features, measure many complex morphological properties, and perform open-language analysis of lesion characteristics.
arXiv Detail & Related papers (2024-10-10T22:11:43Z)
Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports. It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z)
MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation [2.2585213273821716]
We propose a novel framework, called MedCLIP-SAM, that combines CLIP and SAM models to generate segmentation of clinical scans. By extensively testing three diverse segmentation tasks and medical image modalities, our proposed framework has demonstrated excellent accuracy.
arXiv Detail & Related papers (2024-03-29T15:59:11Z)
Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image Segmentation [10.946806607643689]
The Segment Anything Model (SAM) is a powerful foundation model that introduced revolutionary advancements in natural image segmentation. In this study, we introduce a novel fine-tuning framework that leverages SAM's ability to bundle and process multiple prompts per image.
arXiv Detail & Related papers (2023-10-03T19:05:00Z)
MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-ray Images of Multiple Body Parts [63.30352394004674]
Multi-task Self-super-vised Continual Learning (MUSCLE) is a novel self-supervised pre-training pipeline for medical imaging tasks. MUSCLE aggregates X-rays collected from multiple body parts for representation learning, and adopts a well-designed continual learning procedure. We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection.
arXiv Detail & Related papers (2023-10-03T12:19:19Z)
Introducing Shape Prior Module in Diffusion Model for Medical Image Segmentation [7.7545714516743045]
We propose an end-to-end framework called VerseDiff-UNet, which leverages the denoising diffusion probabilistic model (DDPM) Our approach integrates the diffusion model into a standard U-shaped architecture. We evaluate our method on a single dataset of spine images acquired through X-ray imaging.
arXiv Detail & Related papers (2023-09-12T03:05:00Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Orientation-Shared Convolution Representation for CT Metal Artifact Learning [63.67718355820655]
During X-ray computed tomography (CT) scanning, metallic implants carrying with patients often lead to adverse artifacts. Existing deep-learning-based methods have gained promising reconstruction performance. We propose an orientation-shared convolution representation strategy to adapt the physical prior structures of artifacts.
arXiv Detail & Related papers (2022-12-26T13:56:12Z)
RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [7.618389245539657]
We develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays. We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images.
arXiv Detail & Related papers (2022-11-23T06:58:09Z)
Improving Chest X-Ray Classification by RNN-based Patient Monitoring [0.34998703934432673]
We analyze how information about diagnosis can improve CNN-based image classification models. We show that a model trained on additional patient history information outperforms a model trained without the information by a significant margin.
arXiv Detail & Related papers (2022-10-28T11:47:15Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z)
Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning. We generate a corresponding radiology image in a target domain while preserving the identity of the patient. We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z)
Cross-Modal Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images. We first apply an image encoder to classify the chest X-rays and to generate the image features. The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z)
SAM: Self-supervised Learning of Pixel-wise Anatomical Embeddings in Radiological Images [23.582516309813425]
We introduce Self-supervised Anatomical eMbedding (SAM) to learn the intrinsic structure from unlabeled images. SAM generates semantic embeddings for each image pixel that describes its anatomical location or body part. We demonstrate the effectiveness of SAM in multiple tasks with 2D and 3D image modalities.
arXiv Detail & Related papers (2020-12-04T03:31:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.