A geometry-aware deep network for depth estimation in monocular
endoscopy
- URL: http://arxiv.org/abs/2304.10241v1
- Date: Thu, 20 Apr 2023 11:59:32 GMT
- Title: A geometry-aware deep network for depth estimation in monocular
endoscopy
- Authors: Yongming Yang, Shuwei Shao, Tao Yang, Peng Wang, Zhuo Yang, Chengdong
Wu, Hao Liu
- Abstract summary: The proposed method is extensively validated across different datasets and clinical images.
The generalizability of the proposed method achieves mean RMSE values of 12.604 (T1-L1), 9.930 (T2-L2), and 13.893 (colon) on the ColonDepth dataset.
- Score: 17.425158094539462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation is critical for endoscopists to perform spatial
perception and 3D navigation of surgical sites. However, most of the existing
methods ignore the important geometric structural consistency, which inevitably
leads to performance degradation and distortion of 3D reconstruction. To
address this issue, we introduce a gradient loss to penalize edge fluctuations
ambiguous around stepped edge structures and a normal loss to explicitly
express the sensitivity to frequently small structures, and propose a geometric
consistency loss to spreads the spatial information across the sample grids to
constrain the global geometric anatomy structures. In addition, we develop a
synthetic RGB-Depth dataset that captures the anatomical structures under
reflections and illumination variations. The proposed method is extensively
validated across different datasets and clinical images and achieves mean RMSE
values of 0.066 (stomach), 0.029 (small intestine), and 0.139 (colon) on the
EndoSLAM dataset. The generalizability of the proposed method achieves mean
RMSE values of 12.604 (T1-L1), 9.930 (T2-L2), and 13.893 (T3-L3) on the
ColonDepth dataset. The experimental results show that our method exceeds
previous state-of-the-art competitors and generates more consistent depth maps
and reasonable anatomical structures. The quality of intraoperative 3D
structure perception from endoscopic videos of the proposed method meets the
accuracy requirements of video-CT registration algorithms for endoscopic
navigation. The dataset and the source code will be available at
https://github.com/YYM-SIA/LINGMI-MR.
Related papers
- ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation [67.22294293695255]
We propose a novel reconstruction pipeline with a bi-directional adaptation architecture named ToDER to get precise depth estimations.
Experimental results demonstrate that our approach can precisely predict depth maps in both realistic and synthetic colonoscopy videos.
arXiv Detail & Related papers (2024-07-23T14:24:26Z) - A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy
from Monocular Endoscopic Video [8.32570164101507]
We perform a quantitative analysis of a self-supervised approach for sinus reconstruction using endoscopic sequences and optical tracking.
Our results show that the generated reconstructions are in high agreement with the anatomy, yielding an average point-to-mesh error of 0.91 mm.
We identify that pose and depth estimation inaccuracies contribute equally to this error and that locally consistent sequences with shorter trajectories generate more accurate reconstructions.
arXiv Detail & Related papers (2023-10-22T17:11:40Z) - $E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in
Diffusion MRI [4.726777092009554]
We present a framework for sparse deconvolution of volumes where each voxel contains a spherical signal.
This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations.
arXiv Detail & Related papers (2023-04-12T18:37:32Z) - Enforcing connectivity of 3D linear structures using their 2D
projections [54.0598511446694]
We propose to improve the 3D connectivity of our results by minimizing a sum of topology-aware losses on their 2D projections.
This suffices to increase the accuracy and to reduce the annotation effort required to provide the required annotated training data.
arXiv Detail & Related papers (2022-07-14T11:42:18Z) - Deep Implicit Statistical Shape Models for 3D Medical Image Delineation [47.78425002879612]
3D delineation of anatomical structures is a cardinal goal in medical imaging analysis.
Prior to deep learning, statistical shape models that imposed anatomical constraints and produced high quality surfaces were a core technology.
We present deep implicit statistical shape models (DISSMs), a new approach to delineation that marries the representation power of CNNs with the robustness of SSMs.
arXiv Detail & Related papers (2021-04-07T01:15:06Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z) - Deep Volumetric Universal Lesion Detection using Light-Weight Pseudo 3D
Convolution and Surface Point Regression [23.916776570010285]
Computer-aided lesion/significant-findings detection techniques are at the core of medical imaging.
We propose a novel deep anchor-free one-stage VULD framework that incorporates (1) P3DC operators to recycle the architectural configurations and pre-trained weights from the off-the-shelf 2D networks.
New SPR method to effectively regress the 3D lesion spatial extents by pinpointing their representative key points on lesion surfaces.
arXiv Detail & Related papers (2020-08-30T19:42:06Z) - Appearance Learning for Image-based Motion Estimation in Tomography [60.980769164955454]
In tomographic imaging, anatomical structures are reconstructed by applying a pseudo-inverse forward model to acquired signals.
Patient motion corrupts the geometry alignment in the reconstruction process resulting in motion artifacts.
We propose an appearance learning approach recognizing the structures of rigid motion independently from the scanned object.
arXiv Detail & Related papers (2020-06-18T09:49:11Z) - Limited Angle Tomography for Transmission X-Ray Microscopy Using Deep
Learning [12.991428974915795]
Deep learning is applied to limited angle reconstruction in X-ray microscopy for the first time.
The U-Net, the state-of-the-art neural network in biomedical imaging, is trained from synthetic data.
The proposed method remarkably improves the 3-D visualization of the subcellular structures in the chlorella cell.
arXiv Detail & Related papers (2020-01-08T12:11:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.