Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
- URL: http://arxiv.org/abs/2405.01156v1
- Date: Thu, 2 May 2024 10:18:22 GMT
- Title: Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
- Authors: Saahil Islam, Venkatesh N. Murthy, Dominik Neumann, Badhan Kumar Das, Puneet Sharma, Andreas Maier, Dorin Comaniciu, Florin C. Ghesu,
- Abstract summary: We propose a novel approach to learn procedural features from a very large data cohort of over 16 million interventional X-ray frames.
Our approach is based on a masked image modeling technique that leverages frame-based reconstruction to learn fine inter-frame temporal correspondences.
Experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions.
- Score: 6.262161803642583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked image modeling technique that leverages frame interpolation based reconstruction to learn fine inter-frame temporal correspondences. The features encoded in the resulting model are fine-tuned downstream. Our approach achieves state-of-the-art performance and in particular robustness compared to ultra optimized reference solutions (that use multi-stage feature fusion, multi-task and flow regularization). The experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions (23.20% when flow regularization is used); achieving a success score of 97.95% at a 3x faster inference speed of 42 frames-per-second (on GPU). The results encourage the use of our approach in various other tasks within interventional image analytics that require effective understanding of spatio-temporal semantics.
Related papers
- Multi-Scale Feature Fusion with Image-Driven Spatial Integration for Left Atrium Segmentation from Cardiac MRI Images [0.0]
We propose a framework that integrates DINOv2 as an encoder with a UNet-style decoder, incorporating multi-scale feature fusion and input image integration.
We validate our approach on the LAScarQS 2022 dataset and demonstrate improved performance with a 92.3% Dice and 84.1% IoU score for giant architecture.
arXiv Detail & Related papers (2025-02-10T16:12:46Z) - A Novel Tracking Framework for Devices in X-ray Leveraging Supplementary Cue-Driven Self-Supervised Features [6.262161803642583]
We propose a self-supervised learning approach that enhances its-temporal visibility.
We introduce a generic real-time tracking framework that effectively leverages the pretrained-temporal network.
Our method achieves an 87% reduction in max error for balloon marker detection and a 61% reduction in max error for catheter tip detection.
arXiv Detail & Related papers (2025-01-22T15:32:07Z) - Efficient Frame Extraction: A Novel Approach Through Frame Similarity and Surgical Tool Tracking for Video Segmentation [1.6092864505858449]
We propose a technique that can efficiently eliminate redundant frames to reduce dataset size and computation time.
Specifically, we compute the similarity between consecutive frames by tracking the movement of surgical tools.
By adaptively selecting relevant frames, we achieve a tenfold reduction in the number of frames while improving accuracy by 4.32%.
arXiv Detail & Related papers (2025-01-19T19:36:09Z) - CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation [22.886841531680567]
Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning.
However, inherent physical and technical constraints of imaging hardware often necessitate a compromise between temporal resolution and image quality.
We propose a novel approach for continuously modeling patient anatomic motion using implicit neural representation.
arXiv Detail & Related papers (2024-05-24T09:35:42Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Spatial gradient consistency for unsupervised learning of hyperspectral
demosaicking: Application to surgical imaging [4.795951381086172]
Hyperspectral imaging has the potential to improve tissue characterisation in real-time and with high-resolution.
A demosaicking algorithm is required to fully recover the spatial and spectral information of the snapshot images.
We present a fully unsupervised hyperspectral image demosaicking algorithm which only requires snapshot images for training purposes.
arXiv Detail & Related papers (2023-02-21T18:07:14Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.