Related papers: DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy

DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy

URL: http://arxiv.org/abs/2403.01683v2
Date: Fri, 15 Mar 2024 07:25:48 GMT
Title: DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy
Authors: Qingyao Tian, Huai Liao, Xinyan Huang, Jian Chen, Zihui Zhang, Bingyu Yang, Sebastien Ourselin, Hongbin Liu,
Abstract summary: We propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework.
Score: 5.8722774441994074
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-training. The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. To address the domain gap among patients, we propose a knowledge-embedded depth estimation network that maps endoscope frames to depth, ensuring generalization by eliminating patient-specific textures. The network embeds view synthesis knowledge into a cycle adversarial architecture for scale-constrained monocular depth estimation. For real-time performance, our localization module embeds a fast ego-motion estimation network into the loop of depth registration. The ego-motion inference network estimates the pose change of the bronchoscope in high frequency while depth registration against the pre-operative 3D model provides absolute pose periodically. Specifically, the relative pose changes are fed into the registration process as the initial guess to boost its accuracy and speed. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework: 1) monocular depth estimation outperforms SOTA, 2) localization achieves an accuracy of Absolute Tracking Error (ATE) of 4.7 $\pm$ 3.17 mm in phantom and 6.49 $\pm$ 3.88 mm in patient data, 3) with a frame-rate approaching video capture speed, 4) without the necessity of case-wise network retraining. The framework's superior speed and accuracy demonstrate its promising clinical potential for real-time bronchoscopic navigation.

Related papers

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs. We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning [0.7499722271664147]
A novel framework is proposed to perform real-time ego-motion tracking for endoscope. A multi-modal visual feature learning network is proposed to perform relative pose prediction. The absolute pose of endoscope is calculated based on relative poses.
arXiv Detail & Related papers (2025-01-30T03:58:41Z)
Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors [10.61978045582697]
3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract. Existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder.
arXiv Detail & Related papers (2024-11-26T15:43:06Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos [12.780908780402516]
We present the first video classification framework for automatic carotid stenosis grading (CSG) We propose a novel and effective video classification network for weakly-supervised CSG. Our approach is extensively validated on a large clinically collected carotid US video dataset.
arXiv Detail & Related papers (2023-06-05T02:50:06Z)
CSDN: Combing Shallow and Deep Networks for Accurate Real-time Segmentation of High-definition Intravascular Ultrasound Images [4.062948258086793]
We propose a two-stream framework for efficient segmentation of 60 MHz high resolution IVUS images. It combines shallow and deep networks, namely, CSDN. Treating the above information separately enables learning a model to achieve high accuracy and high efficiency for accurate real-time segmentation.
arXiv Detail & Related papers (2023-01-30T14:42:48Z)
Accurate and Real-time Pseudo Lidar Detection: Is Stereo Neural Network Really Necessary? [6.8067583993953775]
We develop a system with a less powerful stereo matching predictor and adopt the proposed refinement schemes to improve the accuracy. The presented system achieves competitive accuracy to the state-of-the-art approaches with only 23 ms computing, showing it is a suitable candidate for deploying to real car-hold applications.
arXiv Detail & Related papers (2022-06-28T09:53:00Z)
Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network [9.349668170221975]
We develop an unsupervised deep learning-based framework to correct inter-frame body motion. The motion estimation network is a convolutional neural network with a combined convolutional long short-term memory layer. Once trained, the motion estimation inference time of our proposed network was around 460 times faster than the conventional registration baseline.
arXiv Detail & Related papers (2022-06-13T17:38:16Z)
Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training. Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z)
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer. To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
Enhancing Fiber Orientation Distributions using convolutional Neural Networks [0.0]
We learn improved FODs for commercially acquired MRI. We evaluate patch-based 3D convolutional neural networks (CNNs) Our approach may enable robust CSD model estimation on single-shell dMRI acquisition protocols.
arXiv Detail & Related papers (2020-08-12T16:06:25Z)
4D Spatio-Temporal Convolutional Networks for Object Position Estimation in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images. We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
AutoHR: A Strong End-to-end Baseline for Remote Heart Rate Measurement with Neural Searching [76.4844593082362]
We investigate the reason why existing end-to-end networks perform poorly in challenging conditions and establish a strong baseline for remote HR measurement with architecture search (NAS) Comprehensive experiments are performed on three benchmark datasets on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2020-04-26T05:43:21Z)
Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection. The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.