Related papers: GazeSAM: What You See is What You Segment

GazeSAM: What You See is What You Segment

URL: http://arxiv.org/abs/2304.13844v1
Date: Wed, 26 Apr 2023 22:18:29 GMT
Title: GazeSAM: What You See is What You Segment
Authors: Bin Wang, Armstrong Aboah, Zheyuan Zhang, Ulas Bagci
Abstract summary: This study investigates the potential of eye-tracking technology and the Segment Anything Model (SAM) to design a collaborative human-computer interaction system that automates medical image segmentation. We present the textbfGazeSAM system to enable radiologists to collect segmentation masks by simply looking at the region of interest during image diagnosis.
Score: 11.116729994007686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates the potential of eye-tracking technology and the Segment Anything Model (SAM) to design a collaborative human-computer interaction system that automates medical image segmentation. We present the \textbf{GazeSAM} system to enable radiologists to collect segmentation masks by simply looking at the region of interest during image diagnosis. The proposed system tracks radiologists' eye movement and utilizes the eye-gaze data as the input prompt for SAM, which automatically generates the segmentation mask in real time. This study is the first work to leverage the power of eye-tracking technology and SAM to enhance the efficiency of daily clinical practice. Moreover, eye-gaze data coupled with image and corresponding segmentation labels can be easily recorded for further advanced eye-tracking research. The code is available in \url{https://github.com/ukaukaaaa/GazeSAM}.

Related papers

Zero-Shot Gaze-based Volumetric Medical Image Segmentation [0.40964539027092917]
We introduce eye gaze as a novel informational modality for interactive segmentation.<n>We evaluate the performance of using gaze-based prompts with SAM-2 and MedSAM-2 using both synthetic and real gaze data.
arXiv Detail & Related papers (2025-05-21T08:34:13Z)
Organ-aware Multi-scale Medical Image Segmentation Using Text Prompt Engineering [17.273290949721975]
Existing medical image segmentation methods rely on uni-modal visual inputs, such as images or videos, requiring labor-intensive manual annotations. Medical imaging techniques capture multiple intertwined organs within a single scan, further complicating segmentation accuracy. To address these challenges, MedSAM was developed to enhance segmentation accuracy by integrating image features with user-provided prompts.
arXiv Detail & Related papers (2025-03-18T01:35:34Z)
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation [47.789013598970925]
We propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation. Our model outperforms the state-of-the-art semi-supervised segmentation approaches.
arXiv Detail & Related papers (2024-12-18T11:19:23Z)
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss. We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z)
CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM [2.9500242602590565]
CycleSAM is an approach for one-shot surgical scene segmentation using the training image-mask pair at test-time. We employ a ResNet50 encoder pretrained on surgical images in a self-supervised fashion, thereby maintaining high label-efficiency.
arXiv Detail & Related papers (2024-07-09T12:08:07Z)
MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation [2.2585213273821716]
We propose a novel framework, called MedCLIP-SAM, that combines CLIP and SAM models to generate segmentation of clinical scans. By extensively testing three diverse segmentation tasks and medical image modalities, our proposed framework has demonstrated excellent accuracy.
arXiv Detail & Related papers (2024-03-29T15:59:11Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
Segment Anything Model-guided Collaborative Learning Network for Scribble-supervised Polyp Segmentation [45.15517909664628]
Polyp segmentation plays a vital role in accurately locating polyps at an early stage. pixel-wise annotation for polyp images by physicians during the diagnosis is both time-consuming and expensive. We propose a novel SAM-guided Collaborative Learning Network (SAM-CLNet) for scribble-supervised polyp segmentation.
arXiv Detail & Related papers (2023-12-01T03:07:13Z)
Learnable Ophthalmology SAM [7.179656139331778]
We propose a learnable prompt layer suitable for multiple target segmentation in ophthalmology multi-modal images. The learnable prompt layer learns medical prior knowledge from each transformer layer. We demonstrate the effectiveness of our thought based on four medical segmentation tasks based on nine publicly available datasets.
arXiv Detail & Related papers (2023-04-26T10:14:03Z)
Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation. We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting. Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z)
A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data. Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront. Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z)
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data [72.1187887376849]
The selective attention mechanism helps the cognition system focus on task-relevant visual clues by ignoring the presence of distractors. We propose a framework to leverage gaze for medical image analysis tasks with small training data. Our method is demonstrated to achieve superior performance on both 3D tumor segmentation and 2D chest X-ray classification tasks.
arXiv Detail & Related papers (2021-12-02T07:55:25Z)
Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views. We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.