Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
- URL: http://arxiv.org/abs/2405.01156v1
- Date: Thu, 2 May 2024 10:18:22 GMT
- Title: Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
- Authors: Saahil Islam, Venkatesh N. Murthy, Dominik Neumann, Badhan Kumar Das, Puneet Sharma, Andreas Maier, Dorin Comaniciu, Florin C. Ghesu,
- Abstract summary: We propose a novel approach to learn procedural features from a very large data cohort of over 16 million interventional X-ray frames.
Our approach is based on a masked image modeling technique that leverages frame-based reconstruction to learn fine inter-frame temporal correspondences.
Experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions.
- Score: 6.262161803642583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked image modeling technique that leverages frame interpolation based reconstruction to learn fine inter-frame temporal correspondences. The features encoded in the resulting model are fine-tuned downstream. Our approach achieves state-of-the-art performance and in particular robustness compared to ultra optimized reference solutions (that use multi-stage feature fusion, multi-task and flow regularization). The experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions (23.20% when flow regularization is used); achieving a success score of 97.95% at a 3x faster inference speed of 42 frames-per-second (on GPU). The results encourage the use of our approach in various other tasks within interventional image analytics that require effective understanding of spatio-temporal semantics.
Related papers
- Unsupervised Training of a Dynamic Context-Aware Deep Denoising Framework for Low-Dose Fluoroscopic Imaging [6.130738760059542]
Fluoroscopy is critical for real-time X-ray visualization in medical imaging.
Low-dose images are compromised by noise, potentially affecting diagnostic accuracy.
We propose an unsupervised training framework for dynamic context-aware denoising of fluoroscopy image sequences.
arXiv Detail & Related papers (2024-10-29T13:39:31Z) - CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation [22.886841531680567]
Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning.
However, inherent physical and technical constraints of imaging hardware often necessitate a compromise between temporal resolution and image quality.
We propose a novel approach for continuously modeling patient anatomic motion using implicit neural representation.
arXiv Detail & Related papers (2024-05-24T09:35:42Z) - Goal-conditioned reinforcement learning for ultrasound navigation guidance [4.648318344224063]
We propose a novel ultrasound navigation assistance method based on contrastive learning as goal-conditioned reinforcement learning (G)
We augment the previous framework using a novel contrastive patient method (CPB) and a data-augmented contrastive loss.
Our method was developed with a large dataset of 789 patients and obtained an average error of 6.56 mm in position and 9.36 degrees in angle.
arXiv Detail & Related papers (2024-05-02T16:01:58Z) - Attention-aware non-rigid image registration for accelerated MR imaging [10.47044784972188]
We introduce an attention-aware deep learning-based framework that can perform non-rigid pairwise registration for fully sampled and accelerated MRI.
We extract local visual representations to build similarity maps between the registered image pairs at multiple resolution levels.
We demonstrate that our model derives reliable and consistent motion fields across different sampling trajectories.
arXiv Detail & Related papers (2024-04-26T14:25:07Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Spatial gradient consistency for unsupervised learning of hyperspectral
demosaicking: Application to surgical imaging [4.795951381086172]
Hyperspectral imaging has the potential to improve tissue characterisation in real-time and with high-resolution.
A demosaicking algorithm is required to fully recover the spatial and spectral information of the snapshot images.
We present a fully unsupervised hyperspectral image demosaicking algorithm which only requires snapshot images for training purposes.
arXiv Detail & Related papers (2023-02-21T18:07:14Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.