V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy
- URL: http://arxiv.org/abs/2412.17595v1
- Date: Mon, 23 Dec 2024 14:11:30 GMT
- Title: V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy
- Authors: Long Bai, Beilei Cui, Liangyu Wang, Yanheng Li, Shilong Yao, Sishen Yuan, Yanan Wu, Yang Zhang, Max Q. -H. Meng, Zhen Li, Weiping Ding, Hongliang Ren,
- Abstract summary: Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization.<n>Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations.<n>We propose V$2$-SfMLearner, a multimodal approach integrating vibration signals into vision-based depth and capsule motion estimation.
- Score: 37.63512910531616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization. However, the collisions of the capsule endoscopies within the gastrointestinal tract cause vibration perturbations in the training data. Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations that could reduce noise and improve performance. Therefore, we propose V$^2$-SfMLearner, a multimodal approach integrating vibration signals into vision-based depth and capsule motion estimation for monocular capsule endoscopy. We construct a multimodal capsule endoscopy dataset containing vibration and visual signals, and our artificial intelligence solution develops an unsupervised method using vision-vibration signals, effectively eliminating vibration perturbations through multimodal learning. Specifically, we carefully design a vibration network branch and a Fourier fusion module, to detect and mitigate vibration noises. The fusion framework is compatible with popular vision-only algorithms. Extensive validation on the multimodal dataset demonstrates superior performance and robustness against vision-only algorithms. Without the need for large external equipment, our V$^2$-SfMLearner has the potential for integration into clinical capsule robots, providing real-time and dependable digestive examination tools. The findings show promise for practical implementation in clinical settings, enhancing the diagnostic capabilities of doctors.
Related papers
- eNCApsulate: NCA for Precision Diagnosis on Capsule Endoscopes [1.3270838622986498]
Wireless Capsule Endoscopy is a pain-free alternative to traditional endoscopy.
Techniques like bleeding detection and depth estimation can help with localization of pathologies, but deep learning models are typically too large to run directly on the capsule.
We distill a large foundation model into the lean NCA architecture, by treating the outputs of the foundation model as pseudo ground truth.
We then port the trained NCA to the ESP32 microcontroller, enabling efficient image processing on hardware as small as a camera capsule.
arXiv Detail & Related papers (2025-04-30T12:06:56Z) - EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance.
It encodes anatomical knowledge and motion-induced visual dynamics.
It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z) - Vascular Segmentation of Functional Ultrasound Images using Deep Learning [0.0]
We introduce the first deep learning-based segmentation tool for functional ultrasound (fUS) images.
We achieve competitive segmentation performance, with 90% accuracy, with 71% robustness and an IU of 0.59, using only 100 temporal frames from a fUS stack.
This work offers a non-invasive, cost-effective alternative to localization microscopy, enhancing fUS data interpretation and improving understanding of vessel function.
arXiv Detail & Related papers (2024-10-28T09:00:28Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training.
We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue.
We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch.
This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved
Wavelet Attention and Reverse Diffusion [24.560417980602928]
Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases.
Deep learning-based low-light image enhancement (LLIE) in the medical field gradually attracts researchers.
We introduce a WCE LLIE framework based on the multi-scale convolutional neural network (CNN) and reverse diffusion process.
arXiv Detail & Related papers (2023-07-05T17:23:42Z) - Adversarial Distortion Learning for Medical Image Denoising [43.53912137735094]
We present a novel adversarial distortion learning (ADL) for denoising two- and three-dimensional (2D/3D) biomedical image data.
The proposed ADL consists of two auto-encoders: a denoiser and a discriminator.
Both the denoiser and the discriminator are built upon a proposed auto-encoder called Efficient-Unet.
arXiv Detail & Related papers (2022-04-29T13:47:39Z) - Deep Learning for Ultrasound Beamforming [120.12255978513912]
Beamforming, the process of mapping received ultrasound echoes to the spatial image domain, lies at the heart of the ultrasound image formation chain.
Modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing.
Deep learning methods can play a compelling role in the digital beamforming pipeline.
arXiv Detail & Related papers (2021-09-23T15:15:21Z) - Multi-Disease Detection in Retinal Imaging based on Ensembling
Heterogeneous Deep Learning Models [0.0]
We propose an innovative multi-disease detection pipeline for retinal imaging.
Our pipeline includes state-of-the-art strategies like transfer learning, class weighting, real-time image augmentation and Focal loss utilization.
arXiv Detail & Related papers (2021-03-26T18:02:17Z) - VR-Caps: A Virtual Environment for Capsule Endoscopy [8.499489366784374]
Current capsule endoscopes and next-generation robotic capsules for diagnosis and treatment of gastrointestinal diseases are complex cyber-physical platforms.
Data-driven algorithms promise to enable many advanced functionalities for capsule endoscopes, but real-world data is challenging to obtain.
Physically-realistic simulations providing synthetic data have emerged as a solution to the development of data-driven algorithms.
arXiv Detail & Related papers (2020-08-29T09:54:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.