Towards user-centered interactive medical image segmentation in VR with an assistive AI agent
- URL: http://arxiv.org/abs/2505.07214v3
- Date: Sun, 25 May 2025 01:26:38 GMT
- Title: Towards user-centered interactive medical image segmentation in VR with an assistive AI agent
- Authors: Pascal Spiegler, Arash Harirpoush, Yiming Xiao,
- Abstract summary: We propose SAMIRA, a novel conversational AI agent for medical VR that assists users with localizing, segmenting, and visualizing 3D medical concepts.<n>The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding.<n>With a user study, evaluations demonstrated a high usability score (SUS=90.0 $pm$ 9.0), low overall task load, and strong support for the proposed VR system's guidance.
- Score: 0.5578116134031106
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Crucial in disease analysis and surgical planning, manual segmentation of volumetric medical scans (e.g. MRI, CT) is laborious, error-prone, and challenging to master, while fully automatic algorithms can benefit from user feedback. Therefore, with the complementary power of the latest radiological AI foundation models and virtual reality (VR)'s intuitive data interaction, we propose SAMIRA, a novel conversational AI agent for medical VR that assists users with localizing, segmenting, and visualizing 3D medical concepts. Through speech-based interaction, the agent helps users understand radiological features, locate clinical targets, and generate segmentation masks that can be refined with just a few point prompts. The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding. Furthermore, to determine the optimal interaction paradigm under near-far attention-switching for refining segmentation masks in an immersive, human-in-the-loop workflow, we compare VR controller pointing, head pointing, and eye tracking as input modes. With a user study, evaluations demonstrated a high usability score (SUS=90.0 $\pm$ 9.0), low overall task load, as well as strong support for the proposed VR system's guidance, training potential, and integration of AI in radiological segmentation tasks.
Related papers
- Ascribe New Dimensions to Scientific Data Visualization with VR [1.9084093324993718]
This article introduces ASCRIBE-VR, a VR platform of Autonomous Solutions for Computational Research with Immersive Browsing & Exploration.<n> ASCRIBE-VR enables multimodal analysis, structural assessments, and immersive visualization, supporting scientific visualization of advanced datasets such as X-ray CT, Magnetic Resonance, and synthetic 3D imaging.
arXiv Detail & Related papers (2025-04-18T03:59:39Z) - MG-3D: Multi-Grained Knowledge-Enhanced 3D Medical Vision-Language Pre-training [7.968487067774351]
3D medical image analysis is pivotal in numerous clinical applications.<n>Large-scale vision-language pre-training remains underexplored in 3D medical image analysis.<n>We propose MG-3D, pre-trained on large-scale data (47.1K)
arXiv Detail & Related papers (2024-12-08T09:45:59Z) - Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness [44.15562068190958]
In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings.
State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable.
We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras.
arXiv Detail & Related papers (2024-07-07T17:17:52Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.
We show that our model captures the distinct functionalities of each region of human vision system.
Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Multisensory extended reality applications offer benefits for volumetric biomedical image analysis in research and medicine [2.46537907738351]
3D data from high-resolution volumetric imaging is a central resource for diagnosis and treatment in modern medicine.
Recent research used extended reality (XR) for perceiving 3D images with visual depth perception and touch but used restrictive haptic devices.
In this study, 24 experts for biomedical images in research and medicine explored 3D medical shapes with 3 applications.
arXiv Detail & Related papers (2023-11-07T13:37:47Z) - ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
Understanding [67.21613160846299]
Embodied Reference Understanding (ERU) is first designed for this concern.
New dataset called ScanERU is constructed to evaluate the effectiveness of this idea.
arXiv Detail & Related papers (2023-03-23T11:36:14Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Leveraging Human Selective Attention for Medical Image Analysis with
Limited Training Data [72.1187887376849]
The selective attention mechanism helps the cognition system focus on task-relevant visual clues by ignoring the presence of distractors.
We propose a framework to leverage gaze for medical image analysis tasks with small training data.
Our method is demonstrated to achieve superior performance on both 3D tumor segmentation and 2D chest X-ray classification tasks.
arXiv Detail & Related papers (2021-12-02T07:55:25Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Joint Scene and Object Tracking for Cost-Effective Augmented Reality
Assisted Patient Positioning in Radiation Therapy [0.6299766708197884]
The research done in the field of Augmented Reality (AR) for patient positioning in radiation therapy is scarce.
We propose an efficient and cost-effective algorithm for tracking the scene and the patient to interactively assist the patient's positioning process.
arXiv Detail & Related papers (2020-10-05T10:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.