EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
- URL: http://arxiv.org/abs/2504.13065v1
- Date: Thu, 17 Apr 2025 16:19:05 GMT
- Title: EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
- Authors: Yang Yue, Yulin Wang, Haojun Jiang, Pan Liu, Shiji Song, Gao Huang,
- Abstract summary: We present EchoWorld, a motion-aware world modeling framework for probe guidance.<n>It encodes anatomical knowledge and motion-induced visual dynamics.<n>It is trained on more than one million ultrasound images from over 200 routine scans.
- Score: 79.66329903007869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Echocardiography is crucial for cardiovascular disease detection but relies heavily on experienced sonographers. Echocardiography probe guidance systems, which provide real-time movement instructions for acquiring standard plane images, offer a promising solution for AI-assisted or fully autonomous scanning. However, developing effective machine learning models for this task remains challenging, as they must grasp heart anatomy and the intricate interplay between probe motion and visual signals. To address this, we present EchoWorld, a motion-aware world modeling framework for probe guidance that encodes anatomical knowledge and motion-induced visual dynamics, while effectively leveraging past visual-motion sequences to enhance guidance precision. EchoWorld employs a pre-training strategy inspired by world modeling principles, where the model predicts masked anatomical regions and simulates the visual outcomes of probe adjustments. Built upon this pre-trained model, we introduce a motion-aware attention mechanism in the fine-tuning stage that effectively integrates historical visual-motion data, enabling precise and adaptive probe guidance. Trained on more than one million ultrasound images from over 200 routine scans, EchoWorld effectively captures key echocardiographic knowledge, as validated by qualitative analysis. Moreover, our method significantly reduces guidance errors compared to existing visual backbones and guidance frameworks, excelling in both single-frame and sequential evaluation protocols. Code is available at https://github.com/LeapLabTHU/EchoWorld.
Related papers
- EchoFM: Foundation Model for Generalizable Echocardiogram Analysis [22.585990526913246]
We introduce EchoFM, a foundation model specifically designed to represent and analyze echocardiography videos.<n>In EchoFM, we propose a self-supervised learning framework that captures both spatial and temporal variability.<n>We pre-train our model on an extensive dataset comprising over 290,000 echocardiography videos, with up to 20 million frames of images.
arXiv Detail & Related papers (2024-10-30T19:32:02Z) - EchoApex: A General-Purpose Vision Foundation Model for Echocardiography [9.202542805578432]
We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice.
Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres.
Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture.
arXiv Detail & Related papers (2024-10-14T21:10:56Z) - Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets [2.0286377328378737]
Heart diseases are the main international cause of human defunction. According to the WHO, nearly 18 million people decease each year because of heart diseases.
In this work, an automatic cardiac pathology recognition system based on a novel deep learning framework is proposed.
arXiv Detail & Related papers (2024-04-30T14:16:45Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - Intelligent Robotic Sonographer: Mutual Information-based Disentangled
Reward Learning from Few Demonstrations [42.731081399649916]
This work proposes an intelligent robotic sonographer to autonomously "explore" target anatomies and navigate a US probe to a relevant 2D plane by learning from the expert.
The underlying high-level physiological knowledge from experts is inferred by a neural reward function.
The proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in-vivo human carotid data.
arXiv Detail & Related papers (2023-07-07T16:30:50Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - A Temporal Learning Approach to Inpainting Endoscopic Specularities and
Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities.
This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner.
We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.